From: Archie Cobbs <[EMAIL PROTECTED]> Subject: Re: backgroud fsck is still locking up system (fwd) In-Reply-To: <[EMAIL PROTECTED]> To: Nate Lawson <[EMAIL PROTECTED]> Date: Fri, 6 Dec 2002 10:57:13 -0800 (PST) CC: Kirk McKusick <[EMAIL PROTECTED]>, Archie Cobbs <[EMAIL PROTECTED]>, [EMAIL PROTECTED] X-ASK-Info: Whitelist match
Nate Lawson wrote: > > Does the background fsck process continue to run, or does the whole > > system come to a halt? If the fsck process continues to run, what > > happens when it eventually finishes? Is the system still dead, or > > does it come back to life? If the system does not come back to life > > can you get me the output of `ps axl'? If not, can you break into > > the debugger and get a ps output? (You will need to have the DDB > > option specified in your config file). > > Sorry for butting in. I think Archie is referring to bg fsck gaining > an unfair share of cpu due to it running due to IO completions. Last I > heard, we were waiting until after 5.0 to experiment with scheduler > changes to make it more fair. I have not seen any hard locks or other > problems with bg fsck after your commit. I'm actually seeing something different. The box becomes unresponsive (except for virtual console changes and CTRL-ALT-ESC) but there's no disk activity. It never recovers. Reproduced it again just now. After pulling the plug and rebooting I didn't touch the box. It booted normally, started background fsck, and the HDD light was blinking as expected. After about 10 seconds, rather suddenly the HDD light stopped blinking. At this point it was pretty dead. Broke into the debugger and it showed a similar 'ps' output to what I previously posted. -Archie Your ps shows fsck_ufs and the syncer process both blocked on "nbufbs". That means the system has blocked them from running bacause it feels that there are too many dirty buffers. What you are probably experiencing is that you have a relatively small memory machine which has a rather low threshhold for blocking on dirty buffers. All the dirty buffers in your system are held by the indirect blocks of the snapshot and thus the bufdaemon cannot push them out. That task can only be done by the syncer who is also blocked. Could you please run the following command on your system and send me the results: sysctl vfs.lodirtybuffers sysctl vfs.hidirtybuffers sysctl vfs.numdirtybuffers both before and after the lockup. If you cannot run this command after the lockup, the global variable names are: lodirtybuffers hidirtybuffers numdirtybuffers If my hypothesis is correct, that will let me tweek the thrshholds on dirty buffers to get a solution. Kirk McKusick To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message