I have been looking at zfs source trying to get up to speed on the
internals.  One thing that interests me about the fs is what appears to be
a low hanging fruit for block squishing CAS (Content Addressable Storage).
I think that in addition to lzjb compression, squishing blocks that contain
the same data would buy a lot of space for administrators working in many
common workflows.

I am writing to see if I can get some feedback from people that know the
code better than I -- are there any gotchas in my logic?

Assumptions:

SHA256 hash used (Fletcher2/4 have too many collisions,  SHA256 is 2^128 if
I remember correctly)
SHA256 hash is taken on the data portion of the block as it exists on disk.
the metadata structure is hashed separately.
In the current metadata structure, there is a reserved bit portion to be
used in the future.


Description of change:
Creates:
The filesystem goes through its normal process of writing a block, and
creating the checksum.
Before the step where the metadata tree is pushed, the checksum is checked
against a global checksum tree to see if there is any match.
If match exists, insert a metadata placeholder for the block, that
references the already existing block on disk, increment a number_of_links
pointer on the metadata blocks to keep track of the pointers pointing to
this block.
free up the new block that was written and check-summed to be used in the
future.
else if no match, update the checksum tree with the new checksum and
continue as normal.


Deletes:
normal process, except verifying that the number_of_links count is lowered
and if it is non zero then do not free the block.
clean up checksum tree as needed.

What this requires:
A new flag in metadata that can tag the block as a CAS block.
A checksum tree that allows easy fast lookup of checksum keys.
a counter in the metadata or hash tree that tracks links back to blocks.
Some additions to the userland apps to push the config/enable modes.

Does this seem feasible?  Are there any blocking points that I am missing
or unaware of?   I am just posting this for discussion,  it seems very
interesting to me.

-Wade

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to