On Wednesday, 23 May 2018 1:10:08 PM AEST Craig Sanders via luv-main wrote:
> far too much RAM to be worth doing.  It's a great way to minimuse use of
> cheap disks ($60 per TB or less) by using lots of very expensive RAM ($15
> per GB or more).
>
> A very rough rule of thumb is that de-duplication uses around 1GB of RAM per
> TB of storage.  Definitely not worth it.  About the only good use case I've
> seen for de-duping is a server with hundreds of GBs of RAM providing
> storage for lots of mostly-duplicate clone VMs, like at an ISP or other
> hosting provider.  It's only worthwile there because of the performance
> improvement that comes from NOT having multiple copies of the same
> data-blocks (taking more space in the ARC & L2ARC caches, and causing more
> seek time delays if using spinning rust rather than SSDs).  Even then, it's
> debatable whether just adding more disk would be better.

http://www.oracle.com/technetwork/articles/servers-storage-admin/o11-113-size-zfs-dedup-1354231.html

Some Google results suggest it's up to 5G of RAM per TB of storage, the above 
URL seems to suggest 2.4G/TB.  At your prices 2.4G of RAM costs $36 so if it 
could save you 600G of disk space (IE 1.6TB of regular storage deduped to 1TB 
of disk space which means 38% of blocks being duplicates) it would save money 
in theory.  In practice it's probably more about which resource you run out of 
and which you can easily increase.  Buying bigger disks generally seems to be 
easier than buying more RAM due to limited number of DIMM slots and 
unreasonable prices for the larger DIMMs.

> Compression's worth doing on most filesystems, though. lz4 is a very fast,
> very low cpu usage algorithm, and (depending on what kind of data) on
> average you'll probably get about 1/3rd to 1/2 reduction of space used by
> compressible files.  e.g. some of the datasets on the machine I just built
> (called "hex"):
> 
> # zfs get compressratio hex hex/home hex/var/log hex/var/cache
> NAME           PROPERTY       VALUE  SOURCE
> hex            compressratio  1.88x  -
> hex/home       compressratio  2.00x  -
> hex/var/cache  compressratio  1.09x  -
> hex/var/log    compressratio  4.44x  -
> 
> The first entry is the overall compression ratio for the entire pool. 
> 1.88:1 ratio. So compression is currently saving me nearly half of my disk
> usage. It's a new machine, so there's not much on it at the moment.

Strangely I never saw such good compression when storing email on ZFS.  One 
would expect email to compress well (for starters anything like Huffman coding 
will give significant benefits) but it seems not.

> I'd probably get even better compression on the logs (at least 6x, probably
> more) if I set it to use gzip for that dataset with:
> 
>     zfs set compression=gzip hex/var/log

I never knew about that, it would probably have helped the mail store a lot.

> (note that won't re-compress existing data.  only new data will be
> compressed with the new algorithm)

If you are storing logs on a filesystem that supports compression you should 
turn off your distribution's support for compressing logs.  That will read and 
rewrite the log files from a cron job and end up not providing much benefit to 
size.

-- 
My Main Blog         http://etbe.coker.com.au/
My Documents Blog    http://doc.coker.com.au/



_______________________________________________
luv-main mailing list
[email protected]
https://lists.luv.asn.au/cgi-bin/mailman/listinfo/luv-main

Reply via email to