I have tried to "bisect" the config changes, and builds working/not working between rc3-rc4-rc5, and come out with the same frustrating result, that building a "clean" kernel is not producing the same behavoir as incremental building while bisecting. For some reason even after getting to the same config step-by-step is not making the kernel work, similar with actual bisecting. So for now I simply use my patch to do the timeout. Thinking of it - should I submit a patch like that to you for consideration? It may be usefull for other users with the suspend problems...
Thanks, Woody On Wed, Aug 21, 2019 at 4:15 PM Thomas Gleixner <t...@linutronix.de> wrote: > > On Tue, 20 Aug 2019, Woody Suwalski wrote: > > On Thu, Aug 15, 2019 at 2:37 AM Thomas Gleixner <t...@linutronix.de> wrote: > > > On Tue, 13 Aug 2019, Woody Suwalski wrote: > > > > On Mon, Aug 12, 2019 at 1:24 PM Thomas Gleixner <t...@linutronix.de> > > > > wrote: > > > > > The ACPI handler is not the culprit. This is either an emulation bug > > > > > or > > > > > something really strange. Can you please use a WARN_ON() if the loop > > > > > is > > > > > exited via the timeout so we can see in which context this happens? > > > > > > > > > > > > > B. On 5.3-rc4 problem is gone. I guess it is overall good sign. > > > > > > Now the interesting question is what changed between 5.3-rc3 and > > > 5.3-rc4. Could you please try to bisect that? > > > > > > > Apparently I can not, and frustrated'ingly do not understand it. > > Tried twice, and every time I get it broken to the end of bisection - > > so the fixed-in-5.3-rc4 theory falls apart. Yet if I build cleanly > > 5.3-rc4 or -rc5, it works OK. > > Then on a 32 bit system - I first tried with a scaled-down kernel > > (just with the drivers needed in the VM). That one is never working, > > even in rc5. Yet the "full" kernel works OK. So now there is a config > > issue variation on top of other problem? > > Looks like and it would be good to know which knob it is. > > Can you send me the two configs please? > > > > dpm_suspend_noirq() is called with all CPUs online and interrupts > > > enabled. In that case an interrupt pending in IRR does not make any sense > > > at all. Confused. > > > > > For now I use a timeout counter patch - and it is showing 100% irq9 > > jammed and needing rescue. And I am even more confused... > > You're not alone, if that gives you a bit of comfort :) > > Thanks, > > tglx