On Thu, Jan 21, 2010 at 10:00 PM, Richard Elling <richard.ell...@gmail.com> wrote: > On Jan 21, 2010, at 8:04 AM, erik.ableson wrote: > >> Hi all, >> >> I'm going to be trying out some tests using b130 for dedup on a server with >> about 1,7Tb of useable storage (14x146 in two raidz vdevs of 7 disks). What >> I'm trying to get a handle on is how to estimate the memory overhead >> required for dedup on that amount of storage. From what I gather, the dedup >> hash keys are held in ARC and L2ARC and as such are in competition for the >> available memory. > > ... and written to disk, of course. > > For ARC sizing, more is always better. > >> So the question is how much memory or L2ARC would be necessary to ensure >> that I'm never going back to disk to read out the hash keys. Better yet >> would be some kind of algorithm for calculating the overhead. eg - averaged >> block size of 4K = a hash key for every 4k stored and a hash occupies 256 >> bits. An associated question is then how does the ARC handle competition >> between hash keys and regular ARC functions? > > AFAIK, there is no special treatment given to the DDT. The DDT is stored like > other metadata and (currently) not easily accounted for. > > Also the DDT keys are 320 bits. The key itself includes the logical and > physical > block size and compression. The DDT entry is even larger.
Looking at dedupe code, I noticed that on-disk DDT entries are compressed less efficiently than possible: key is not compressed at all (I'd expect roughly 2:1 compression ration with sha256 data), while other entry data is currently passed through zle compressor only (I'd expect this one to be less efficient than off-the-shelf compressors, feel free to correct me if I'm wrong). Is this v1, going to be improved in the future? Further, with huge dedupe memory footprint and heavy performance impact when DDT entries need to be read from disk, it might be worthwhile to consider compression of in-core ddt entries (specifically for DDTs or, more generally, making ARC/L2ARC compression-aware). Has this been considered? Regards, Andrey > > I think it is better to think of the ARC as caching the uncompressed DDT > blocks which were written to disk. The number of these will be data > dependent. > "zdb -S poolname" will give you an idea of the number of blocks and how well > dedup will work on your data, but that means you already have the data in a > pool. > -- richard > > >> Based on these estimations, I think that I should be able to calculate the >> following: >> 1,7 TB >> 1740,8 GB >> 1782579,2 MB >> 1825361100,8 KB >> 4 average block size >> 456340275,2 blocks >> 256 hash key size-bits >> 1,16823E+11 hash key overhead - bits >> 14602888806,4 hash key size-bytes >> 14260633,6 hash key size-KB >> 13926,4 hash key size-MB >> 13,6 hash key overhead-GB >> >> Of course the big question on this will be the average block size - or >> better yet - to be able to analyze an existing datastore to see just how >> many blocks it uses and what is the current distribution of different block >> sizes. I'm currently playing around with zdb with mixed success on >> extracting this kind of data. That's also a worst case scenario since it's >> counting really small blocks and using 100% of available storage - highly >> unlikely. >> >> # zdb -ddbb siovale/iphone >> Dataset siovale/iphone [ZPL], ID 2381, cr_txg 3764691, 44.6G, 99 objects >> >> ZIL header: claim_txg 0, claim_blk_seq 0, claim_lr_seq 0 replay_seq 0, >> flags 0x0 >> >> Object lvl iblk dblk dsize lsize %full type >> 0 7 16K 16K 57.0K 64K 77.34 DMU dnode >> 1 1 16K 1K 1.50K 1K 100.00 ZFS master node >> 2 1 16K 512 1.50K 512 100.00 ZFS delete queue >> 3 2 16K 16K 18.0K 32K 100.00 ZFS directory >> 4 3 16K 128K 408M 408M 100.00 ZFS plain file >> 5 1 16K 16K 3.00K 16K 100.00 FUID table >> 6 1 16K 4K 4.50K 4K 100.00 ZFS plain file >> 7 1 16K 6.50K 6.50K 6.50K 100.00 ZFS plain file >> 8 3 16K 128K 952M 952M 100.00 ZFS plain file >> 9 3 16K 128K 912M 912M 100.00 ZFS plain file >> 10 3 16K 128K 695M 695M 100.00 ZFS plain file >> 11 3 16K 128K 914M 914M 100.00 ZFS plain file >> >> Now, if I'm understanding this output properly, object 4 is composed of >> 128KB blocks with a total size of 408MB, meaning that it uses 3264 blocks. >> Can someone confirm (or correct) that assumption? Also, I note that each >> object (as far as my limited testing has shown) has a single block size >> with no internal variation. >> >> Interestingly, all of my zvols seem to use fixed size blocks - that is, >> there is no variation in the block sizes - they're all the size defined on >> creation with no dynamic block sizes being used. I previously thought that >> the -b option set the maximum size, rather than fixing all blocks. Learned >> something today :-) >> >> # zdb -ddbb siovale/testvol >> Dataset siovale/testvol [ZVOL], ID 45, cr_txg 4717890, 23.9K, 2 objects >> >> Object lvl iblk dblk dsize lsize %full type >> 0 7 16K 16K 21.0K 16K 6.25 DMU dnode >> 1 1 16K 64K 0 64K 0.00 zvol object >> 2 1 16K 512 1.50K 512 100.00 zvol prop >> >> # zdb -ddbb siovale/tm-media >> Dataset siovale/tm-media [ZVOL], ID 706, cr_txg 4426997, 240G, 2 objects >> >> ZIL header: claim_txg 0, claim_blk_seq 0, claim_lr_seq 0 replay_seq 0, >> flags 0x0 >> >> Object lvl iblk dblk dsize lsize %full type >> 0 7 16K 16K 21.0K 16K 6.25 DMU dnode >> 1 5 16K 8K 240G 250G 97.33 zvol object >> 2 1 16K 512 1.50K 512 100.00 zvol prop >> >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss@opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss