On Thu, 14 May 2020 00:04:28 +0200 Sebastian Andrzej Siewior <bige...@linutronix.de> wrote:
> On 2020-05-08 23:30:45 [+0200], Stephen Berman wrote: >> > Can you log the output on the serial console? >> >> How do I do that? > > The spec for your mainboard says "serial port header". You would need to > connect a cable there to another computer and log its output. > The alternative would be to delay the output on the console and use a > camera. It's easiest for me to take a picture, since there isn't much output and in any case the delay happens on it's own ;-). I'm sending you the image (from kernel 5.6.4) off-list since even after reducing it it's 1.2 MB large. >> > If the commit you cited is really the problem then it would mean that a >> > worker isn't scheduled for some reason. Could you please enable >> > CONFIG_WQ_WATCHDOG to see if workqueue core code notices that a worker >> > isn't making progress? I enabled that and also CONFIG_SOFTLOCKUP_DETECTOR, CONFIG_HARDLOCKUP_DETECTOR and CONFIG_DETECT_HUNG_TASK, which had all been unset previously. >> How will I know if that happens, is there a specific message in the tty? > > On the tty console where you see the "timing out command, waited" > message, there should be something starting with > |BUG: workqueue lockup - pool > > following with the pool information that got stuck. That code checks the > workqueues every 30secs by default. So if you waited >= 60secs then > system is not detecting a stall. As you can see in the photo, there was no message about a workqueue lockup, only "task halt:5320 blocked for more than <XXX> seconds" every two minutes. I suppose that comes from one of the other options I enabled. Does it reveal anything about the problem? > As far as I can tell, there is nothing special on your system. The CD > and disk drives are served by the AHCI controller. There is no special > SCSI/SATA/SAS controller. > Right now I have no idea how the workqueues fit in the picture. Could > you please check if the stall-dector says something? Is that the message I repeated above or do you mean the workqueue? > Is it possible to show me output when the timeout message comes? My > guess is that the system is going down and before unounting/remount RO > the filesystem it flushes its last data. But this is done before issuing > the "halt-syscall". The entire output from `shutdown -h now' is in the picture; after the fourth "timing out command" message, I pressed the reset button. Steve Berman