I've had about zero time in the last week and won't likely have much for another week, but I think little by little I've made the e-switch bug become apparent. Flashy Mike was actually right I think about watchdog timer but I think for the wrong reason (not entirely sure). Watchdog timer certainly doesn't reset just because interrupts are disabled or are not implemented. That's only if WDE is set to 1. But if wdie is set to 1 without setting wde to 1, it's in a purely interrupt mode without reset, or it certainly is according to the manual and to anything you can read about it. And that must be true because I used interrupts that did not reset the watchdog and that fact is required to explain the later parts.
However after some sequence of events I believe that the watchdog reset was somehow arming and tripping. There were a few things I had not understood about the watchdog reset (because I never intended to use it) or might have understood that sooner.
A) The watchdog prescale register gets reset to 16ms. I guess that's obvious, it's a full hardware reset, but I hadn't considered it.
B) The wdrf (watchdog reset flag) gets set in mcusr and causes the watchdog to be automatically armed on next start.
Then since my watchdog interrupts don't reset the watchdog, it dies again in 16ms. Initially this seemed just like locking. After building a minimal test case that happened to get the light on faster, it istead made a low pwm strobe. Then when I clicked the switch, it would seem to react and light correctly briefly, on short press and for quite awhile after restarting from long press off. Well my switch interrupt resets the watchdog timeout too 1/4s on short press for 1/4 sleeps and to 9s on long press. So it temporarily extended the timer. the 9 seconds really confused me because it would step through some code with no branches and suddenly just strobe again (reseting prescale again upon restart).
So it all adds up now except for one thing. Why is it triggering in the first place? And do I care? I cannot find any answer for that. Presumably something is writing to wde but only very rarely. The chance of it happening does seem to depend on random code variations, things like the length of time that interrupts are disabled for example. There is absolutely nothing in the manual that says a truly empty interrupt is required in interrupt mode or that missing or delaying interrupts should arm the watchdog if wde wasn't already set (this happens in combined interrupt/reset mode but not interrupt-only mode).
I find it hard to believe I have some corrupt runaway pointer that finds its way all the way down to the watchdog register and doesn't clobber a whole ton of other stuff first, and I see no sign of it on the simulator.
Still I'm assuming that if I do one or a few things, reset the watchdog in the interrupt, or clear wdrf on start, or both, that the problem will go away or at worst become a very rare misread click.
I'll give a shot soon. Sure would like to know the root cause though.
For what it's worth the manual does instruct one to do these things, but it also says basically to protect from runaway pointers or brownout corruption. Could I be seeing brown out corruption? on the attiny85 switch light? How? Or on the low voltage OTSM mcu with BODS? It doesn't make sense.
At this point I don't think the hardware debugger will help. I don't think there's a way to get a stack dump just prior to a watchdog reset.