Re: Minutes from the Technical Board, 2008-07-15

Andrew Sayers Tue, 19 Aug 2008 14:33:35 -0700

I think there's an elephant in this room - why are we running fsck at all?

a) If it's to detect corruption due to software errors, fsck should be
linked up to apport, and reported (semi-)automatically.
b) If it's to check for dying hardware[1], it can be disabled for all
but the oldest hard drives[2], and even then is better replaced with a
badblocks check run while booting continues
c) If it's to guard against bit-flipping caused by cosmic rays and other
weirdness[3], snapshot-based solutions discussed earlier would be more
appropriate, because the most vulnerable drives are huge/highly active
ones that live on servers that never get rebooted.


The nearest to a definitive statement that I've been able to find is
from the tune2fs man page.  The following is from the text for the "-c"
option:

        Bad disk drives, cables, memory, and kernel bugs could all
        corrupt a filesystem without marking the filesystem dirty or
        in error.

(A similar message is included in the text of the "-i" option)

This seems to cover all the above alternatives.  Given that any solution
that wants to make it into Intrepid has to be feature-complete by the
28th, how about doing 'fsck ... | tee /var/tmp/fsck.log || mv
/tmp/fsck.log /var/cache/apport.log' in checkfs, then getting apport to
pick up any logs and ask to report them in the usual way?  Then we'll
have better data to make a decision with for Intrepid+1.

        - Andrew

[1]https://lists.ubuntu.com/archives/ubuntu-devel-discuss/2007-October/001843.html
[2]https://lists.ubuntu.com/archives/ubuntu-devel-discuss/2007-October/001856.html
[3]http://kerneltrap.org/Linux/Data_Errors_During_Drive_Communication

-- 
Ubuntu-devel-discuss mailing list
Ubuntu-devel-discuss@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel-discuss

Re: Minutes from the Technical Board, 2008-07-15

Reply via email to