Depending on the data content that you're dealing you can compress the snapshots inline with the send/receive operations by piping the data through gzip. Given that we've been talking about 500Mb text files, this seems to be a very likely solution. There was some mention in the Kernel Keynote in Australia of inline deduplication, ie compression :-) in the zfs send stream. But there remains the question of references to deduplicated blocks that no longer exist on the destination.

Noting that ZFS deduplication will eventually help in diminishing the overall volume you have to treat since that while the output of the text editor will be to different physical blocks, many of these blocks will be identical to previously stored blocks (which will also be kept since they exist in snapshots) so that the send/receive operations will consist of a lot more block references rather than complete blocks.

Erik

PS - this is pretty much the operational mode of all products that use snapshots. It's even worse on a lot of other storage systems where the snapshot content must be written to a specific reserved volume (which is often very small compared to the main data store) rather than the host pool. Until deduplication becomes the standard method of managing blocks, the volume of data required by this use case will not change.

On 30 sept. 2009, at 16:35, Brian Hubbleday wrote:

I took binary dumps of the snapshots taken in between the edits and this showed that there was actually very little change in the block structure, however the incremental snapshots were very large. So the conclusion I draw from this is that the snapshot simply contains every written block since the last snapshot regardless of whether the data in the block has changed or not.

Okay so snapshots work this way, I'm simply suggesting that things could be better.

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to