On Jan 12, 2012, at 4:12 PM, Jim Klimov wrote: > As I recently wrote, my data pool has experienced some > "unrecoverable errors". It seems that a userdata block > of deduped data got corrupted and no longer matches the > stored checksum. For whatever reason, raidz2 did not > help in recovery of this data, so I rsync'ed the files > over from another copy. Then things got interesting... > > Bug alert: it seems the block-pointer block with that > mismatching checksum did not get invalidated, so my > attempts to rsync known-good versions of the bad files > from external source seemed to work, but in fact failed: > subsequent reads of the files produced IO errors. > Apparently (my wild guess), upon writing the blocks, > checksums were calculated and the matching DDT entry > was found. ZFS did not care that the entry pointed to > inconsistent data (not matching the checksum now), > it still increased the DDT counter. > > The problem was solved by disabling dedup for the dataset > involved and rsync-updating the file in-place. After the > dedup feature was disabled and new blocks were uniquely > written, everything was readable (and md5sums matched) > as expected. > > I think of a couple of solutions:
In theory, the verify option will correct this going forward. > If the block is detected to be corrupt (checksum mismatches > the data), the checksum value in blockpointers and DDT > should be rewritten to an "impossible" value, perhaps > all-zeroes or such, when the error is detected. What if it is a transient fault? > Alternatively (opportunistically), a flag might be set > in the DDT entry requesting that a new write mathching > this stored checksum should get committed to disk - thus > "repairing" all files which reference the block (at least, > stopping the IO errors). verify eliminates this failure mode. > Alas, so far there is anyways no guarantee that it was > not the checksum itself that got corrupted (except for > using ZDB to retrieve the block contents and matching > that with a known-good copy of the data, if any), so > corruption of the checksum would also cause replacement > of "really-good-but-normally-inaccessible" data. Extrememly unlikely. The metadata is also checksummed. To arrive here you will have to have two corruptions each of which generate the proper checksum. Not impossible, but… I'd buy a lottery ticket instead. See also dedupditto. I could argue that the default value of dedupditto should be 2 rather than "off". > //Jim Klimov > > (Bug reported to Illumos: https://www.illumos.org/issues/1981) Thanks! -- richard -- ZFS and performance consulting http://www.RichardElling.com _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss