Brandon High wrote:
On Fri, Jul 9, 2010 at 5:18 PM, Brandon High <bh...@freaks.com <mailto:bh...@freaks.com>> wrote:

    I think that DDT entries are a little bigger than what you're
    using. The size seems to range between 150 and 250 bytes depending
    on how it's calculated, call it 200b each. Your 128G dataset would
    require closer to 200M (+/- 25%) for the DDT if your data was
    completely unique. 1TB of unique data would require 600M - 1000M
    for the DDT.


Using 376b per entry, it's 376M for 128G of unique data, or just under 3GB for 1TB of unique data.

A 1TB zvol with 8k blocks would require almost 24GB of memory to hold the DDT. Ouch.

-B


To reduce RAM requirements, consider an offline or idle time dedupe. I suggested a variation of this in regards to compress a while ago, probably on this list.

In either case, you have the system write the data whichever way is fastest.

If there is enough unused CPU power, run maximum compression, otherwise use fast compression. If new data type specific compression algorithms are added, attempt compression with those as well (e.g. lossless JPEG recompression that can save 20-25% space). Store the block in whichever compression format works best.

If there is enough RAM to maintain a live dedupe table, dedupe right away.

If CPU and RAM pressures are too high, defer dedupe and compression to a periodic scrub (or some other new periodically run command). In the deferred case, the dedupe table entries could be generated as blocks are filled/change and then kept on disk. Periodically that table would be quicksorted by the hash, and then any duplicates would be found next to each other. The blocks for the duplicates would be looked up, verified as truly identical, and then re-written (probably also using BP rewrite). Quicksort is parallelable and sorting a multi-gigabyte table is a plausible operation, even on disk. Quicksort 100mb pieces of it in RAM and iterate until the whole table ends up sorted.

The end result of all this idle time compression and deduping is that the initially allocated storage space becomes the upper bound storage requirement, and that the data will end up packing tighter over time. The phrasing on bulk packaged items comes to mind: "Contents may have settled during shipping".


Now a theoretical question about dedupe...what about the interaction with defragmentation (this also probably needs BP rewrite)? The first file will be completely defragmented, but the second file that is a slight variation of the first will have at least two fragments (the deduped portion, and the unique portion). Probably the performance impact will be minor as long as each fragment is a decent minimum size (multiple MB).

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to