Depending on the data content that you're dealing you can compress the
snapshots inline with the send/receive operations by piping the data
through gzip. Given that we've been talking about 500Mb text files,
this seems to be a very likely solution. There was some mention in the
Kernel Keynote in Australia of inline deduplication, ie
compression :-) in the zfs send stream. But there remains the question
of references to deduplicated blocks that no longer exist on the
destination.
Noting that ZFS deduplication will eventually help in diminishing the
overall volume you have to treat since that while the output of the
text editor will be to different physical blocks, many of these blocks
will be identical to previously stored blocks (which will also be kept
since they exist in snapshots) so that the send/receive operations
will consist of a lot more block references rather than complete blocks.
Erik
PS - this is pretty much the operational mode of all products that use
snapshots. It's even worse on a lot of other storage systems where
the snapshot content must be written to a specific reserved volume
(which is often very small compared to the main data store) rather
than the host pool. Until deduplication becomes the standard method of
managing blocks, the volume of data required by this use case will not
change.
On 30 sept. 2009, at 16:35, Brian Hubbleday wrote:
I took binary dumps of the snapshots taken in between the edits and
this showed that there was actually very little change in the block
structure, however the incremental snapshots were very large. So the
conclusion I draw from this is that the snapshot simply contains
every written block since the last snapshot regardless of whether
the data in the block has changed or not.
Okay so snapshots work this way, I'm simply suggesting that things
could be better.
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss