Hi all,

I'm going to be trying out some tests using b130 for dedup on a server with 
about 1,7Tb of useable storage (14x146 in two raidz vdevs of 7 disks).  What 
I'm trying to get a handle on is how to estimate the memory overhead required 
for dedup on that amount of storage.  From what I gather, the dedup hash keys 
are held in ARC and L2ARC and as such are in competition for the available 
memory.

So the question is how much memory or L2ARC would be necessary to ensure that 
I'm never going back to disk to read out the hash keys. Better yet would be 
some kind of algorithm for calculating the overhead. eg - averaged block size 
of 4K = a hash key for every 4k stored and a hash occupies 256 bits. An 
associated question is then how does the ARC handle competition between hash 
keys and regular ARC functions?

Based on these estimations, I think that I should be able to calculate the 
following:
1,7     TB
1740,8  GB
1782579,2       MB
1825361100,8    KB
4       average block size
456340275,2     blocks
256     hash key size-bits
1,16823E+11     hash key overhead - bits
14602888806,4   hash key size-bytes
14260633,6      hash key size-KB
13926,4 hash key size-MB
13,6    hash key overhead-GB

Of course the big question on this will be the average block size - or better 
yet - to be able to analyze an existing datastore to see just how many blocks 
it uses and what is the current distribution of different block sizes. I'm 
currently playing around with zdb with mixed success  on extracting this kind 
of data. That's also a worst case scenario since it's counting really small 
blocks and using 100% of available storage - highly unlikely. 

# zdb -ddbb siovale/iphone
Dataset siovale/iphone [ZPL], ID 2381, cr_txg 3764691, 44.6G, 99 objects

    ZIL header: claim_txg 0, claim_blk_seq 0, claim_lr_seq 0 replay_seq 0, 
flags 0x0

    Object  lvl   iblk   dblk  dsize  lsize   %full  type
         0    7    16K    16K  57.0K    64K   77.34  DMU dnode
         1    1    16K     1K  1.50K     1K  100.00  ZFS master node
         2    1    16K    512  1.50K    512  100.00  ZFS delete queue
         3    2    16K    16K  18.0K    32K  100.00  ZFS directory
         4    3    16K   128K   408M   408M  100.00  ZFS plain file
         5    1    16K    16K  3.00K    16K  100.00  FUID table
         6    1    16K     4K  4.50K     4K  100.00  ZFS plain file
         7    1    16K  6.50K  6.50K  6.50K  100.00  ZFS plain file
         8    3    16K   128K   952M   952M  100.00  ZFS plain file
         9    3    16K   128K   912M   912M  100.00  ZFS plain file
        10    3    16K   128K   695M   695M  100.00  ZFS plain file
        11    3    16K   128K   914M   914M  100.00  ZFS plain file
 
Now, if I'm understanding this output properly, object 4 is composed of 128KB 
blocks with a total size of 408MB, meaning that it uses 3264 blocks.  Can 
someone confirm (or correct) that assumption? Also, I note that each object  
(as far as my limited testing has shown) has a single block size with no 
internal variation.

Interestingly, all of my zvols seem to use fixed size blocks - that is, there 
is no variation in the block sizes - they're all the size defined on creation 
with no dynamic block sizes being used. I previously thought that the -b option 
set the maximum size, rather than fixing all blocks.  Learned something today 
:-)

# zdb -ddbb siovale/testvol
Dataset siovale/testvol [ZVOL], ID 45, cr_txg 4717890, 23.9K, 2 objects

    Object  lvl   iblk   dblk  dsize  lsize   %full  type
         0    7    16K    16K  21.0K    16K    6.25  DMU dnode
         1    1    16K    64K      0    64K    0.00  zvol object
         2    1    16K    512  1.50K    512  100.00  zvol prop

# zdb -ddbb siovale/tm-media
Dataset siovale/tm-media [ZVOL], ID 706, cr_txg 4426997, 240G, 2 objects

    ZIL header: claim_txg 0, claim_blk_seq 0, claim_lr_seq 0 replay_seq 0, 
flags 0x0

    Object  lvl   iblk   dblk  dsize  lsize   %full  type
         0    7    16K    16K  21.0K    16K    6.25  DMU dnode
         1    5    16K     8K   240G   250G   97.33  zvol object
         2    1    16K    512  1.50K    512  100.00  zvol prop

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to