On Fri, Feb 02, 2007 at 05:19:48PM +0100, Johannes Wiedersich wrote: > [EMAIL PROTECTED] wrote: > > These messages look similar -- but not identical -- to the ones I had > > while installing an etch system -- and eventually I came to suspect the > > file-system-damage bug in the Debian 2.6.18-3 kernel (sometimes > > because of a race condition a buffer is not written to hard disk, > > although it should have been). It doesn't hit most systems, but when it > > does, it can be a disaster. Eventually one of my installs ended up with > > an unparseable apt-related status file -- I think it was the list of > > installed packages. I looked at it and it was binary gibberish > > (although it was supposed to be ordinary ASCII text). > > I didn't know that sarge's kernel was also affected by this > athene:~# uname -a > Linux athene 2.6.8-3-k7 #1 Tue Dec 5 23:58:25 UTC 2006 i686 GNU/Linux
I don't think so. In any case, I didn't know you were running sarge. As far as I know, sarge is pre-disaster. > > >> Here are my questions: > >> > >> Is it save to leave the system as it is, or should I do a reinstall in > >> order to be sure that the system is 'clean'? How could I check, that no > >> other files are affected except those 'reinstalled'? > >> > >> Is it common, that a failure of a raid disk leads to a system freeze, > >> even though the affected drive is _NOT_ part of / or any FSH directory? > > > > I've noticed freezes with NFS -- if the remote system carrying the > > physical volume is shut off without the clients first unmounting it, the > > client processes freeze next time they try to access the NFS volume. > > Eventually more and more processes freeze, unkillably, and the system > > gradually grinds to a halt. They stay frozen even if the remote system > > comes up again. Oddly, if the remote system is brought up again > > *before* they access it, they never notice, and just run normally. > > > > Could it be something similar? > > Well, the box in question was _exporting_ the relevant partition via nfs > and samba. Of course there were some 'problems' with the clients when > the nfs suddenly disappeared... > > So maybe these freezes also occur for the exporting machine. Don't think so. > > >> Is there anything I could do to try to avoid this for the future? > > > > Maybe check for bad blocks? > > I actually run smartmontools, but those mailed the bad health only when > the drive was already dead... > > > Maybe avoid having both parts of the RAID on the same IDE chain? > > Sorry for forgetting to post this, but the raid consists of /dev/hdb and > /dev/hdd. That is as far apart as possible on that > probably-too-cheap-for-the-purpose box ;-) OK. You did that right, too. > > Thanks, > > Johannes > > > -- > To UNSUBSCRIBE, email to [EMAIL PROTECTED] > with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED] > -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]