Sorry fort he late answer. Approximately it's 150 bytes per individual block. So increasing the blocksize is a good idea. Also when L1 and L2 arc is not enough system will start making disk IOPS and RaidZ is not very effective for random IOPS and it's likely that when your dram is not enough your perfor ance will suffer. You may choose to use Raid 10 which is a lot better on random loads Mertol
Mertol Ozyoney Storage Practice - Sales Manager Sun Microsystems, TR Istanbul TR Phone +902123352200 Mobile +905339310752 Fax +902123352222 Email mertol.ozyo...@sun.com -----Original Message----- From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of erik.ableson Sent: Thursday, January 21, 2010 6:05 PM To: zfs-discuss Subject: [zfs-discuss] Dedup memory overhead Hi all, I'm going to be trying out some tests using b130 for dedup on a server with about 1,7Tb of useable storage (14x146 in two raidz vdevs of 7 disks). What I'm trying to get a handle on is how to estimate the memory overhead required for dedup on that amount of storage. From what I gather, the dedup hash keys are held in ARC and L2ARC and as such are in competition for the available memory. So the question is how much memory or L2ARC would be necessary to ensure that I'm never going back to disk to read out the hash keys. Better yet would be some kind of algorithm for calculating the overhead. eg - averaged block size of 4K = a hash key for every 4k stored and a hash occupies 256 bits. An associated question is then how does the ARC handle competition between hash keys and regular ARC functions? Based on these estimations, I think that I should be able to calculate the following: 1,7 TB 1740,8 GB 1782579,2 MB 1825361100,8 KB 4 average block size 456340275,2 blocks 256 hash key size-bits 1,16823E+11 hash key overhead - bits 14602888806,4 hash key size-bytes 14260633,6 hash key size-KB 13926,4 hash key size-MB 13,6 hash key overhead-GB Of course the big question on this will be the average block size - or better yet - to be able to analyze an existing datastore to see just how many blocks it uses and what is the current distribution of different block sizes. I'm currently playing around with zdb with mixed success on extracting this kind of data. That's also a worst case scenario since it's counting really small blocks and using 100% of available storage - highly unlikely. # zdb -ddbb siovale/iphone Dataset siovale/iphone [ZPL], ID 2381, cr_txg 3764691, 44.6G, 99 objects ZIL header: claim_txg 0, claim_blk_seq 0, claim_lr_seq 0 replay_seq 0, flags 0x0 Object lvl iblk dblk dsize lsize %full type 0 7 16K 16K 57.0K 64K 77.34 DMU dnode 1 1 16K 1K 1.50K 1K 100.00 ZFS master node 2 1 16K 512 1.50K 512 100.00 ZFS delete queue 3 2 16K 16K 18.0K 32K 100.00 ZFS directory 4 3 16K 128K 408M 408M 100.00 ZFS plain file 5 1 16K 16K 3.00K 16K 100.00 FUID table 6 1 16K 4K 4.50K 4K 100.00 ZFS plain file 7 1 16K 6.50K 6.50K 6.50K 100.00 ZFS plain file 8 3 16K 128K 952M 952M 100.00 ZFS plain file 9 3 16K 128K 912M 912M 100.00 ZFS plain file 10 3 16K 128K 695M 695M 100.00 ZFS plain file 11 3 16K 128K 914M 914M 100.00 ZFS plain file Now, if I'm understanding this output properly, object 4 is composed of 128KB blocks with a total size of 408MB, meaning that it uses 3264 blocks. Can someone confirm (or correct) that assumption? Also, I note that each object (as far as my limited testing has shown) has a single block size with no internal variation. Interestingly, all of my zvols seem to use fixed size blocks - that is, there is no variation in the block sizes - they're all the size defined on creation with no dynamic block sizes being used. I previously thought that the -b option set the maximum size, rather than fixing all blocks. Learned something today :-) # zdb -ddbb siovale/testvol Dataset siovale/testvol [ZVOL], ID 45, cr_txg 4717890, 23.9K, 2 objects Object lvl iblk dblk dsize lsize %full type 0 7 16K 16K 21.0K 16K 6.25 DMU dnode 1 1 16K 64K 0 64K 0.00 zvol object 2 1 16K 512 1.50K 512 100.00 zvol prop # zdb -ddbb siovale/tm-media Dataset siovale/tm-media [ZVOL], ID 706, cr_txg 4426997, 240G, 2 objects ZIL header: claim_txg 0, claim_blk_seq 0, claim_lr_seq 0 replay_seq 0, flags 0x0 Object lvl iblk dblk dsize lsize %full type 0 7 16K 16K 21.0K 16K 6.25 DMU dnode 1 5 16K 8K 240G 250G 97.33 zvol object 2 1 16K 512 1.50K 512 100.00 zvol prop _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss