Hello all, I am catching up with some 500 posts that I skipped this summer, and came up with a new question. In short, is it possible to add "restartability" to ZFS SEND, for example by adding artificial snapshots (of configurable increment size) into already existing datasets [too large to be zfs-sent successfully as one chunk of stream data]?
I'll start by pre-history of this question, and continue with the detailed idea below: On one hand, there was a post about a T2000 system kernel panicking while trying to import a pool. It was probable that the pool was receiving a large (3Tb) zfs send stream, and this receiving was aborted due to some external issues. Afterwards the pool apparently got into a cycle of trying to destroy the received part of the stream during a pool import attempt, exhausted all RAM and hanged the server. From my experience reported this spring to the forums (alas, which are now gone - and the forums-to-mail replication did not work at that time) and to the Illumos bugtracker, I hope that the OP's pool did get imported after a few weeks of power cycles. I had different conditions (destroying some snapshots and datasets on a deduped pool) with similar effect. On another hand, there was a discussion (actually, lots of them) about "rsync vs. zfs send". My new question couples these threads. I know that it has been discussed for a number of times that ZFS SEND is more efficient at finding differences and sending updates that a filesystem crawl and calculating checksums all over again. However, RSYNC has an important benefit of being restartable. As shown by the first post I mentioned, broken ZFS SEND operation can lead to long downtimes. With sufficiently large increments (i.e. initial stream of a large dataset), low bandwidth'es and high probability of network errors or power glitches, it may be even guaranteed to never transfer that much data as to complete a single ZFS SEND operation; for example, when replicating 3Tb over a few-Kbps subscriber-level internet link which is reset every 24 hours for ISP's traffic accounting reasons. On the opposite, it is easy to construct an rsync loop which would transfer all files after several weeks of hard work. But that would not be a ZFS-snapshot replica, so further updates can not be made via ZFS SEND either - locking the user into rsync loops forever. Now, I wondered if it is possible to embed snapshots (or some similar construct) into existing data, for the purpose of tab-keeping during zfs send and zfs recv? For example, the same existing 3Tb dataset could be artificially pre-represented as an horde of snapshots each utilizing 1Gb of disk space, with valid ZFS incremental sends over whatever network link we have. However unlike zfs-auto-snap, these snapshots would not really appear on-disk while the dataset was being written (historically). Instead, they would be patched-on by the admins after the factual data appeared on disk, before the ZFS SEND. Alternatively, if the ZFS SEND is detected to have been broken, the sending side might set a "tab" on the offset where it was last reading the sent data. The receiver (upon pool import or whatever other recovery) also would set such a tab, instead of destroying the broken snapshot (which may take weeks and lots of downtime as proved by several reports on the list, including mine) and restarting from scratch - likely doomed to be broken as well. In terms of code this would probably be like the normal "zfs snapshot" mixed with the reverse of "zfs destroy @snapshot", meaning that some existing blocks would be reassigned as "owned" by a newly embedded snapshot instead of being "owned" by the live dataset or some more recent snapshot... //Jim _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss