Jonathan Wheeler wrote: > Thanks for the information, I'm learning quite a lot from all this. > > It seems to me that zfs send *should* be doing some kind of verification, > since some work has clearly been put into zfs so that zfs's can be dumped > into files/pipes. It's a great feature to have, and I can't believe that this > was purely for zfs send | zfs receive scenarios. >
zfs send/receive is not a backup solution because it does not have the features generally expected in a backup solution. It is a very low-level method of replicating dataset structure. If you find documentation to the contrary, which was created after CR6399918 was integrated, then please file a new bug. http://bugs.opensolaris.org/view_bug.do?bug_id=6399918 > A common example used all over the place is zfs send | ssh $host. In these > examples is ssh guaranteeing the data delivery somehow? If not, there need to > be some serious asterisks in these guides! > In this case, the receive does checks and will fail when the checks do not pass. In such cases, the send can be restarted. ssh performs encryption, and encryption codes tend to be more robust because a corruption will tend to fail upon decryption (including the surrounding checksum checks). If you save the contents of the pipe somewhere, then you are at the mercy of the robustness of the saved location. However, there is more that can be done here, both inside and outside of ZFS. For inside ZFS, I have filed an RFE: CR6736837, improve send/receive fault tolerance. However, to be effective, we really need a better understanding of the failures we expect to encounter. As an interim step, know that a send will create the same stream because it is sending a stable set of data. You can send to files twice, on diverse storage, and then compare the resulting files. In other words, the flexibility of UNIX pipes is exposed by zfs send/receive. > Looking at this at a level that I do understand, it's going via TCP, which > checksums packets..... then again, I was using nfs over TCP, and look where I > am today. So much for that! > I do not think you will be able to identify the root cause of your corruption -- there are far too many dependents and you do not have a known-good reference :-(. > As I google these subjects more and more, I fear that I'm hitting the > conceptual mental block that many before me have done also. zfs send is not > zfsdump, even though it sure looks the same, and it's not clearly stated that > you may end up in a situation like the one I'm in today if you don't somehow > test your backups. > Correct, though this applies to everything, in general. One backup method I use (I use several ;-), is to use send/receive to a removable disk, usually a USB disk. I can then setup compression and redundancy policies for the USB disk and also periodically scrub to test the retention. This also offers the ability to go back to any snapshot in a matter of minutes, even though I store the USB disk in a fire safe. Another benefit to this method is that I can easily verify the media -- I was once a user of 8mm tape drives, so I've got several scars related to the inability to recover data from tapes (they had a nasty habit of writing tapes that couldn't be read from other 8mm drives, so if you had to repair your drive (likely), then you might not be able to read your tapes). > As you've rightly pointed out, it's done now and even if I did manage to > reproduce this again, that won't help my data locked away in these 2 .zfs > files, so focusing on the hopeful is there anything I can do to recover my > data from these zfs dumps? Anything at all :) > I filed RFE CR 6736794, option for partial zfs receives. But I'm not confident that it can be implemented easily or quickly. > If the problem is "just" that "zfs receive" is checksumming the data on the > way in, can I disable this somehow within zfs? > Can I globally disable checksumming in the kernel module? mdb something or > rather? > > I read this thread where someone did successfully manage to recovery data > from a damaged zfs, which fulls me with some hope: > http://www.opensolaris.org/jive/thread.jspa?messageID=220125 > > It's way over my head, but if anyone can tell me the mdb commands I'm happy > to try them, even if they do kill my cat. I don't really have anything to > loose with a copy of the data, and I'll do it all in a VM anyway. > With mdb and the source, all things are possible. But I'll have to defer to someone who uses mdb more frequently than I. -- richard _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss