On Mar 21, 2010, at 11:03 AM, Frank Middleton wrote: > On 03/15/10 01:01 PM, David Dyer-Bennet wrote: > >> This sounds really bizarre. > > Yes, it is. ButCR 6880994 is bizarre too.
Rolling back to a conversation with Frank last fall, here is the output of fmdump which shows the single bit flip. Extra lines elided. TIME CLASS Oct 23 2009 14:53:01.525657508 ereport.fs.zfs.checksum class = ereport.fs.zfs.checksum pool = rpool vdev_guid = 0x509094f6dc795c97 vdev_type = disk vdev_path = /dev/dsk/c3d0s0 vdev_devid = id1,c...@amaxtor_6y080l0=y32he6xe/a parent_guid = 0x323cf9d672c3b05a parent_type = mirror zio_err = 50 zio_offset = 0x50384800 zio_size = 0x9800 zio_objset = 0x29 zio_object = 0x1a209 zio_level = 0 zio_blkid = 0x0 cksum_expected = 0x4a027c11b3ba4cec 0xbf274565d5615b7b 0x3ef5fe61b2ed672e 0xec8692f7fd33094a cksum_actual = 0x4a027c11b3ba4cec 0xbf274567d5615b7b 0x3ef5fe61b2ed672e 0xec86a5b3fd33094a cksum_algorithm = fletcher2 bad_ranges = 0x228 0x230 bad_ranges_min_gap = 0x8 bad_range_sets = 0x1 bad_range_clears = 0x0 bad_set_bits = 0x0 0x0 0x0 0x0 0x2 0x0 0x0 0x0 bad_cleared_bits = 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 Here we see that one bit was set, 0x2 when we expected 0x0. Later that same day... Oct 23 2009 14:53:01.525657152 ereport.fs.zfs.checksum class = ereport.fs.zfs.checksum pool = rpool pool_guid = 0x5062a7a7247652b1 vdev_guid = 0x1181c8516c0dc9b0 vdev_type = disk vdev_path = /dev/dsk/c3d1s0 vdev_devid = id1,c...@awdc_wd800bb-00bsa0=wd-wma6s1025599/a parent_guid = 0x323cf9d672c3b05a parent_type = mirror zio_err = 50 zio_offset = 0x50384800 zio_size = 0x9800 zio_objset = 0x29 zio_object = 0x1a209 zio_level = 0 zio_blkid = 0x0 cksum_expected = 0x4a027c11b3ba4cec 0xbf274565d5615b7b 0x3ef5fe61b2ed672e 0xec8692f7fd33094a cksum_actual = 0x4a027c11b3ba4cec 0xbf274567d5615b7b 0x3ef5fe61b2ed672e 0xec86a5b3fd33094a cksum_algorithm = fletcher2 bad_ranges = 0x228 0x230 bad_ranges_min_gap = 0x8 bad_range_sets = 0x1 bad_range_clears = 0x0 bad_set_bits = 0x0 0x0 0x0 0x0 0x2 0x0 0x0 0x0 bad_cleared_bits = 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 So we see the exact same bit flipped (0x2 expecting 0x0) on two different disks, /dev/dsk/c3d0s0 (Maxtor) and /dev/dsk/c3d1s0 (Western Digital), at the same zio offset and size. I feel confident we are not seeing a b0rken drive here. But something is clearly amiss and we cannot rule out the processor, memory, or controller. Frank reports that he sees this on the same file, /lib/libdlpi.so.1, so I'll go out on a limb and speculate that there is something in the bit pattern for that file that intermittently triggers a bit flip on this system. I'll also speculate that this error will not be reproducible on another system. This sort of specific error analysis is possible after b125. See CR6867188 for more details. -- richard ZFS storage and performance consulting at http://www.RichardElling.com ZFS training on deduplication, NexentaStor, and NAS performance Las Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss