On Sun, Jun 26, 2022 at 10:03:59PM +0200, Martin Pieuchot wrote: > On 26/06/22(Sun) 20:36, Caspar Schutijser wrote: > > A laptop of mine (dmesg below) frequently hangs. After some bisecting > > and extensive testing I think I found the commit that causes this: > > mpi@'s > > "Always acquire the `vmobjlock' before incrementing an object's reference." > > commit from 2022-04-28. > > > > My definition of "the system hangs": > > * Display is frozen > > * Switching to ttyC0 using Ctrl+Alt+F1 doesn't do anything > > * System does not respond to keyboard or mouse input > > * Pressing the power button for 1-2 seconds doesn't achieve anything > > (usually this initiates a system shutdown) > > * And also the fan starts spinning > > > > The system sometimes hangs very soon after booting the system, I've > > seen it happen once while I was typing my username in xenodm to log in. > > But sometimes it takes a couple of hours. > > > > For some reason I put > > "@reboot while sleep 1 ; do sync ; done" > > in my crontab and it *seems* (I'm not sure) that the hangs occur more > > frequently this way. Not sure if that is useful information. > > > > I don't see similar problems on my other machines. > > > > It looks like when the system hangs, it's stuck spinning in the new > > code that was added in that commit; to confirm that I added some code > > (see the diff below) to enter ddb if it's spinning there for 10 seconds > > (and then it indeed enters ddb). If my thinking and diff make sense > > I think that indeed confirms that is the problem. > > > > Any tips for debugging this? > > I believe I introduced a deadlock. If you can reproduce it could you > get us the output of `ps' in ddb(4) and the trace of all the active > processes. > > I guess one is waiting for the KERNEL_LOCK() while holding the uobj's > vmobjlock.
"ps" output (pictures only): https://temp.schutijser.com/~caspar/2022-06-27-ddb/ps-1.jpg https://temp.schutijser.com/~caspar/2022-06-27-ddb/ps-2.jpg https://temp.schutijser.com/~caspar/2022-06-27-ddb/ps-3.jpg https://temp.schutijser.com/~caspar/2022-06-27-ddb/ps-4.jpg traces of active processes (I hope; if this is not correct I'm happy to run different commands; pictures and transcription follow): https://temp.schutijser.com/~caspar/2022-06-27-ddb/trace-1.jpg ddb{1}> ps /o TID PID UID PRFLAGS PFLAGS CPU COMMAND *246699 86564 1000 0x2 0 1K sync 395058 12288 48 0x100012 0 0 unwind ddb{1}> trace /t 0t246699 kernel: protection fault trap, code=0 Faulted in DDB; continuing... ddb{1}> trace /t 0t395058 uvm_fault(0xfffffd8448ab5338, 0x1, 0, 1) -> e kernel: page fault trap, code=0 Faulted in DDB; continuing... ddb{1}>
