On Feb 9, 2012, at 2:21 PM, Steven Schlansker wrote: > > On Feb 9, 2012, at 11:05 AM, Mark wrote: >> Steven, out of curiosity, do you see any benefit with dedup (assuming that >> bacula volumes are the only thing on a given zfs volume). I did some >> initial trials and it appeared that bacula savesets don't dedup much, if at >> all, and some searching around pointed to the bacula volume format writing a >> unique value (was it jobid?) to every block, so no two blocks are ever the >> same. I'd backup hundreds of gigs of data and the dedupratio always >> remained 1.00x. > > I didn't do any research, but can confirm that it seems to be useless to turn > dedup on. My pool has always been at 1.00x > I'm going to turn it off because from what I hear dedup is pretty expensive > to run, especially if you don't actually save anything by it.
Some time ago, I enabled dedup on a fileset with ~8 TB of data (about 4 million files) on a FreeBSD 8-STABLE system. Bad move! The machine has 16 GB of RAM but enabling dedup utterly killed it. I discovered, through further research, that dedup requires either a lot of RAM or a read-optimised SSD to hold the dedup table (DDT). Small filesets may work fine, but anything else will quickly eat up RAM. Worse still, the DDT is considered ZFS metadata, and so is limited to 25% of the ARC cache, so you need huge amounts of ARC for large DDT tables. I've read that a rule of thumb is that for every 1 TB of data you should expect 5 GB of DDT, assuming an average block size of 64 KB. For large sizes, therefore, it's not feasible to store the entire DDT in RAM and thus you'd be looking at a low-latency L2ARC solution instead (e.g., SSD). > On the flip side, compression seems to be a very big win. I'm seeing ratios > from 1.7 to 2.5x savings and the CPU usage is claimed to be relatively cheap. That's what I am seeing, too. On the fileset I tried to dedup, I'm currently seeing a compressratio of 1.51x, which I'm happy with for that data. Enabling ZFS compression appears to have negligible overheads, so having turned it on is a big win for me. Cheers, Paul. ------------------------------------------------------------------------------ Virtualization & Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users