On 03/15/10 01:01 PM, David Dyer-Bennet wrote:
This sounds really bizarre.
Yes, it is. ButCR 6880994 is bizarre too.
One detail suggestion on checking what's going on (since I don't have a clue towards a real root-cause determination): Get an md5sum on a clean copy of the file, say from a new install or something, and check the allegedly-corrupted copy against that. This can fairly easily give you a pretty reliable indication if the file is truly corrupted or not.
With many thanks to Danek Duvall, I got a new copy of libdlpi.so.1 # md5sum /lib/libdlpi.so.1 2468392ff87b5810571572eb572d0a41 /lib/libdlpi.so.1 # md5sum /lib/libdlpi.so.1.orig 2468392ff87b5810571572eb572d0a41 /lib/libdlpi.so.1.orig # zpool status -v .... errors: Permanent errors have been detected in the following files: //lib/libdlpi.so.1.orig So here we seem to have an example of a ZFS false positive, the first I've see or heard of. The good news is that it is still possible to read the file, so this augers well for the ability to boot under this circumstance. FWIW fmdump does seem to show show actual checksum errors on all four copies in 16 attempts to read them. There were 3 groups of different bad checksums; within each group the checksum was the same but differed from the expected. Perhaps someone who can could add this to CR 6880994 in the hopes that it might help lead to a better understanding. For the casual reader, CR 6880994 is about a pathological PC that gets checksum errors on the same set of files at boot, even though the root pool is mirrored. With copies=2, usually ZFS can repair them. But after a recent power cycle, all 4 copies reported bad checksums but in reality the the file seems to be uncorrupted. The machine has no ECC and flaky bus parity, so there are plenty of ways for the data to get messed up. It's a mystery why this only happens at boot, though. _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss