>>>>> "jw" == Jonathan Wheeler <[EMAIL PROTECTED]> writes:
jw> A common example used all over the place is zfs send | ssh jw> $host. In these examples is ssh guaranteeing the data delivery jw> somehow? it is really all just appologetics. It sounds like a zfs bug to me. The only alternative is bad hardware (not disks), so you could try memory testers, continuous big 'make -n <big number, like 4 - 10>' builds, scripted continuous zpool send/recv, to look for this. jw> you may end up in a situation like the one I'm in today if you jw> don't somehow test your backups. which is why I asked you to check -n spots it. It doesn't---the tool gives you no way to test the backups! I've lost before because I backed things up onto tape, wiped the original, and then had the tape go bad. The idea of backups is to always have two copies, so I should have written two tapes. but I don't see any reason to believe you wouldn't get two bad copies in your case since it sounds like a bug. I also made the mistake of using FancyTape---I used some DAT bullshit with a ``table of contents'' that can become ``corrupt'' if you power off the drive at the wrong moment, which simpler tape formats don't have. DAT also has these block checksums, where some drives if they can't read part of the tape, they just hang forever and can't seek past it. (weirdly analagous to zfs receive). I had already learned not to gzip a tarball before writing it to tape if the tarball contained mostly uncompressable things, because the gzip format is less robust than the tar format. but, I got bitten anyway because of the stupid tape TOC and the poor exception handling in the DAT drive's firmware. What's required, *given hindsight*, is to realize that the purpose of backups for ZFS users is partly to protect ourselves from ZFS bugs, so the backups need to be stored in a format that has nothing to do with ZFS, like tar or UDF or a non-ZFS filesystem. however if you have lots of snapshots or clones, I'm not sure this is possible because the data expands too much. In that case I might store backups in an zpool rather than in a file, because I expect zpool corruption bugs will get more attention sooner than 'zfs send' corruption bugs. but, that's still sketchy, and had it not been for your experience, I might have trusted the zfs send format. ``learn'', fine, but I don't think you've done anything unreasonable. jw> is there anything I can do to recover my data from these zfs jw> dumps? Anything at all :) fix 'zfs receive' to ignore the error? :) burry the dumps in the sand for two years, and hope someone else fixes ZFS in the mean time? :) That's what I did to my tape with the bad TOC. no good news yet. jw> If the problem is "just" that "zfs receive" is checksumming jw> the data on the way in, can I disable this somehow within zfs? jw> Can I globally disable checksumming in the kernel module? mdb jw> something or rather? sounds plausible but I don't know how, so please let me know if you find a way. I found also some magic /etc/system incantations, but it doesn't seem to apply to 'zfs receive'. It's more of what you found, more ``simon sez, import!'' stuff: http://opensolaris.org/jive/message.jspa?messageID=192572#194209 http://sunsolve.sun.com/search/document.do?assetkey=1-66-233602-1
pgpX1q8bMSIVj.pgp
Description: PGP signature
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss