Hello list, 

I wanted to test deduplication a little and did a experiment. 

My question was:  can I dedupe infinite or is ther a upper limit ? 

So for that I did a very basic test.
-  I created a ramdisk-pool (1GB)
- enabled dedup and 
- wrote zeros to it (in one single file) until an error is returned. 

The size of the pool was 1046 MB, I was able to write 62 GB to it then it says 
"no space left on device". The block size was 128k, so I was able to write 
507.000 blocks to the pool.

With this device beeing full, I see the following: 

1) zfs list reports that no space is left (AVAIL=0)
2) zpool reports that the dedup factor was ~507.000x
3) zpool reports also that 8,6 MB of space were allocated in the pool (0% used)

So for me it looks like there is something broken in ZFS accounting with 
dedupe. 

- zpool and zfs usage free space reporting do not align
- the real deduplication factor was not 507.000 (meaning I would have been able 
to write 507.000x1GB = a lot to the pool) 
- when calculating 1046 MB / 507000 = 2.1 KB, somehow for each block  of 128k, 
2,1 KB of data bas been written (assuming zfs list is correct). What is this ? 
Metadata ? Meaning that I have aprox 1.6 % of Meatadata in ZFS (1/(128k/2,1k)) 
? 

I repeatet the same thing for a recordsize of 32k. The funny thing is: 
- Also 60 GB could be written before "no space left"
- 31 MB of space were alloated in the pool (zpool list)

The version of the pool is 25.

During the experiment I could nicely see:
- that performance on ramdisk is CPU bound doing ~125 MB /sec per Core. 
- performance scales linearly with adding CPU cores. (125 MB/s cor 1core, 253 
Mb/s for 2core, 408 MB/s for 4core). 
- that the upper size of the deduplication table is blocks * ~150 Byte, 
indipendent of the dedupe factor 
- the ddt does not grow for deduplicatable blocks (zdb -D)
- performance goes down factor of ~4 when switching from allocation policy of 
"closest" to "best fit" (when the pool fills rate drops from 250 MB/s to 67 
MB/s. I suspect even worse results for spinning media because of the head 
movements (>10x slow down).

Anyone knowing why the dedup factor is wrong ? Any insights on what has 
actually been written (compressed meta data, deduped meta data .. etc.) would 
be greatly appreshiated. 

Regards, 
Robert
-- 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to