To: Michael Harnois <[EMAIL PROTECTED]>
cc: [EMAIL PROTECTED], [EMAIL PROTECTED]
Subject: Re: filesystem errors
In-Reply-To: Your message of "Wed, 25 Jul 2001 23:14:16 CDT."
<[EMAIL PROTECTED]>
Date: Thu, 26 Jul 2001 15:14:09 +0100
From: Ian Dowse <[EMAIL PROTECTED]>
In message <[EMAIL PROTECTED]>,
Michael Harnois writes:
>I'm tearing my hair out trying to find a filesystem error that's
>causing me a panic: ufsdirhash_checkblock: bad dir inode.
>
>When I run fsck from a single user boot, it finds no errors.
>
>When I run it on the same filesystem mounted, it finds errors: but, of
>course, it then can't correct them
[Kirk, I'm cc'ing you because here the dirhash code sanity checks
found a directory entry with d_ino == 0 that was not at the start
of a DIRBLKSIZ block. This doesn't happen normally, but it seems
from this report that fsck does not correct this. Is it a basic
filesystem assumption that d_ino == 0 can only happen at the start
of a directory block, or is it something the code should tolerate?]
FFS will never set a directory ino == 0 at a location other
than the first entry in a directory, but fsck will do so to
get rid of an unwanted entry. The readdir routines know to
skip over an ino == 0 entry no matter where in the directory
it is found, so applications will never see such entries.
It would be a fair amount of work to change fsck to `do the
right thing', as the checking code is given only the current
entry with which to work. I am of the opinion that you
should simply accept that mid-directory block ino == 0 is
acceptable rather than trying to `fix' the problem.
Interesting - this is an error reported by the UFS_DIRHASH code
that you enabled in your kernel config. A sanity check that the
dirhash code is performing is failing. These checks are designed
to catch bugs in the dirhash code, but in this case I think it may
be a bug that fsck is not finding this problem, or else my sanity
tests are too strict.
A workaround is to turn off the sanity checks with:
sysctl vfs.ufs.dirhash_docheck=0
or to remove UFS_DIRHASH from your kernel config. You could also
try to find the directory that is causing the problems. Copy the
following script to a file called dircheck.pl, and try running:
chmod 755 dircheck.pl
find / -fstype ufs -type d -print0 | xargs ./dircheck.pl
That should show up any directories that would fail that dirhash
sanity check - there will probably just be one or two that resulted
from some old filesystem corruption.
Ian
#!/usr/local/bin/perl
while (defined($dir = shift)) {
unless (open(DIR, "$dir")) {
print STDERR "$dir: $!\n";
next;
}
$b = 0;
my(%dir) = ();
while (sysread(DIR, $dat, 512) == 512) {
$off = 0;
while (length($dat) > 0) {
($dir{'d_ino'}, $dir{'d_reclen'}, $dir{'d_type'},
$dir{'d_namlen'}) = unpack("LSCC", $dat);
$dir{'d_name'} = substr($dat, 8, $dir{'d_namlen'});
$minreclen = (8 + $dir{'d_namlen'} + 1 + 3) & (~3);
$gapinfo = ($dir{'d_reclen'} == $minreclen) ? "" :
sprintf("[%d]", $dir{'d_reclen'} - $minreclen);
if ($dir{'d_ino'} == 0 && $off != 0) {
printf("%s off %d ino %d reclen 0x%x type 0%o"
. " namelen %d name '%s' %s\n",
$dir, $off, $dir{'d_ino'},
$dir{'d_reclen'}, $dir{'d_type'},
$dir{'d_namlen'}, $dir{'d_name'},
$gapinfo);
}
if ($dir{'d_reclen'} > length($dat)) {
die "reclen too long!\n";
}
$dat = substr($dat, $dir{'d_reclen'});
$off += $dir{'d_reclen'};
}
$b++;
}
}
To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message