On Fri, 11 Jul 2025 at 13:35, Linus Torvalds <torva...@linux-foundation.org> wrote: > > Indeed. It turns out that the problem actually started somewhere > between rc4 and rc5, and all my previous bisections never even came > close, because kernels usually work well enough that I never realized > that it went back that far.
It looks like it's actually due to commit 8c44dac8add7 ("eventpoll: Fix priority inversion problem"), and it's been going on for a while now and the behavior was just too subtle for me to have noticed. Does not look hardware-specific, except in the sense that it probably needs several CPU's along with the odd startup pattern to trigger this. It's possible that the bisection ended up wrong, and when it appeared to start going off in the weeds I was like "this is broken again", but before I marked a kernel "good" I tested it several times, and then in the end that "eventpoll: Fix priority inversion problem" kind of makes sense after all. I would never have guessed at that commit otherwise (well, considering that I blamed both the drm code and the netlink code first, that goes without saying), but at the same time, that *is* the kind of change that would certainly make user space get hung up with odd timeouts. I've only tested the previous commit being good twice now, but I'll go back to the head of tree and try a revert to verify that this is really it. Because maybe it's the now Nth time I found something that hides the problem, not the real issue. Fingers crossed that this very timing-dependent odd problem really did bisect right finally, after many false starts. Linus