On Sun, Jun 26, 2022 at 10:03:59PM +0200, Martin Pieuchot wrote:
> On 26/06/22(Sun) 20:36, Caspar Schutijser wrote:
> > A laptop of mine (dmesg below) frequently hangs. After some bisecting
> > and extensive testing I think I found the commit that causes this:
> > mpi@'s
> > "Always acquire the `vmobjlock' before incrementing an object's reference."
> > commit from 2022-04-28.
> > 
> > My definition of "the system hangs": 
> >  * Display is frozen
> >  * Switching to ttyC0 using Ctrl+Alt+F1 doesn't do anything
> >  * System does not respond to keyboard or mouse input
> >  * Pressing the power button for 1-2 seconds doesn't achieve anything
> > (usually this initiates a system shutdown)
> >  * And also the fan starts spinning
> > 
> > The system sometimes hangs very soon after booting the system, I've
> > seen it happen once while I was typing my username in xenodm to log in.
> > But sometimes it takes a couple of hours.
> > 
> > For some reason I put
> > "@reboot while sleep 1 ; do sync ; done"
> > in my crontab and it *seems* (I'm not sure) that the hangs occur more
> > frequently this way. Not sure if that is useful information.
> > 
> > I don't see similar problems on my other machines.
> > 
> > It looks like when the system hangs, it's stuck spinning in the new
> > code that was added in that commit; to confirm that I added some code
> > (see the diff below) to enter ddb if it's spinning there for 10 seconds
> > (and then it indeed enters ddb). If my thinking and diff make sense
> > I think that indeed confirms that is the problem.
> > 
> > Any tips for debugging this?
> 
> I believe I introduced a deadlock.  If you can reproduce it could you
> get us the output of `ps' in ddb(4) and the trace of all the active
> processes.
> 
> I guess one is waiting for the KERNEL_LOCK() while holding the uobj's
> vmobjlock.

"ps" output (pictures only):
https://temp.schutijser.com/~caspar/2022-06-27-ddb/ps-1.jpg
https://temp.schutijser.com/~caspar/2022-06-27-ddb/ps-2.jpg
https://temp.schutijser.com/~caspar/2022-06-27-ddb/ps-3.jpg
https://temp.schutijser.com/~caspar/2022-06-27-ddb/ps-4.jpg


traces of active processes (I hope; if this is not correct I'm happy
to run different commands; pictures and transcription follow):
https://temp.schutijser.com/~caspar/2022-06-27-ddb/trace-1.jpg

ddb{1}> ps /o
    TID    PID    UID    PRFLAGS    PFLAGS  CPU  COMMAND
*246699  86564   1000        0x2         0    1K sync
 395058  12288     48   0x100012         0    0  unwind
ddb{1}> trace /t 0t246699
kernel: protection fault trap, code=0
Faulted in DDB; continuing...
ddb{1}> trace /t 0t395058
uvm_fault(0xfffffd8448ab5338, 0x1, 0, 1) -> e
kernel: page fault trap, code=0
Faulted in DDB; continuing...
ddb{1}>


Reply via email to