Software PWM can work very well as long as you have a timer available (that is not used for anything else) and don't frequently need to suspend interrupts (such as to write to NVM) which will glitch the PWM output. It's fairly easy to implement - here's pseudocode that you would put in an 8-bit timer's overflow interrupt routine. The only overhead you'll need is a variable for the output (duty cycle) you want and a flag bit that toggles to keep track of the phase of the output.
if timer overflowed to zero (interrupt):
toggle the flag
if the flag is set:
turn on the output pin
load the timer with a value of (255 - duty cycle)
else if the flag is clear:
turn off the output pin
load the timer with the value of duty cycle
The way this works is by scheduling the following interrupt to occur in a variable amount of time. By loading the timer with a low value, it will take longer before overflowing past 255 again and the subsequent interrupt will scheduled further out, and vice versa for a high value. So as an example, say the timer runs at 1MHz and the duty cycle is set to 200 (of 255). The interrupt will fire and turn on the output, then load 55 into the timer. The timer then has to count up 200 cycles before firing the interrupt again, when the output is turned off. Then 200 is loaded into the timer, so it only has to count up 55 cycles before firing again and starting the process over. This gives a high output for 200us and a low output for 55us, so we get a ~4kHz PWM signal at ~78% duty.
This can be extended slightly to give better performance at the extremes (0 and 100%) by adding a test for 0 and 100% duty before setting the output:
if timer overflowed to zero (interrupt):
toggle the flag
if the flag is set:
if duty cycle is not zero, turn on the output pin
load the timer with a value of (255 - duty cycle)
else if the flag is clear:
if duty cycle is not 255, turn off the output pin
load the timer with the value of duty cycle
My AVR skills are rusty but I think a simple implementation would look like this, assuming you set up timer0 as an 8-bit timer with an overflow interrupt:
bool flag;
uint duty;
ISR(TIM0_OVF_vect)
{
flag=~flag;
if(flag){
output_pin=1;
TCNT0=(255-duty);
}
else{
output_pin=0;
TCNT0=duty;
}
}