On Sep 24, 2012, at 10:08 AM, Jason Usher <jushe...@yahoo.com> wrote:
> Oh, and one other thing ... > > > --- On Fri, 9/21/12, Jason Usher <jushe...@yahoo.com> wrote: > >>> It shows the allocated number of bytes used by the >>> filesystem, i.e. >>> after compression. To get the uncompressed size, >> multiply >>> "used" by >>> "compressratio" (so for example if used=65G and >>> compressratio=2.00x, >>> then your decompressed size is 2.00 x 65G = 130G). >> >> >> Ok, thank you. The problem with this is, the >> compressratio only goes to two significant digits, which >> means if I do the math, I'm only getting an >> approximation. Since we may use these numbers to >> compute billing, it is important to get it right. >> >> Is there any way at all to get the real *exact* number ? > > > I'm hoping the answer is yes - I've been looking but do not see it ... none can hide from dtrace! # dtrace -qn 'dsl_dataset_stats:entry {this->ds = (dsl_dataset_t *)arg0;printf("%s\tcompressed size = %d\tuncompressed size=%d\n", this->ds->ds_dir->dd_myname, this->ds->ds_phys->ds_compressed_bytes, this->ds->ds_phys->ds_uncompressed_bytes)}' openindiana-1 compressed size = 3667988992 uncompressed size=3759321088 [zfs get all rpool/openindiana-1 in another shell] For reporting, the number is rounded to 2 decimal places. >> Ok. So the dedupratio I see for the entire pool is >> "dedupe ratio for filesystems in this pool that have dedupe >> enabled" ... yes ? >> >> >>>> Also, why do I not see any dedupe stats for the >>> individual filesystem ? I see compressratio, and I >> see >>> dedup=on, but I don't see any dedupratio for the >> filesystem >>> itself... >> >> >> Ok, getting back to precise accounting ... if I turn on >> dedupe for a particular filesystem, and then I multiply the >> "used" property by the compressratio property, and calculate >> the real usage, do I need to do another calculation to >> account for the deduplication ? Or does the "used" >> property not take into account deduping ? > > > So if the answer to this is "yes, the used property is not only a compressed > figure, but a deduped figure" then I think we have a bigger problem ... > > You described dedupe as operating not only within the filesystem with > dedup=on, but between all filesystems with dedupe enabled. > > Doesn't that mean that if I enabled dedupe on more than one filesystem, I can > never know how much total, raw space each of those is using ? Because if the > dedupe ratio is calculated across all of them, it's not the actual ratio for > any one of them ... so even if I do the math, I can't decide what the total > raw usage for one of them is ... right ? Correct. This is by design so that blocks shared amongst different datasets can be deduped -- the common case for things like virtual machine images. > > Again, if "used" does not reflect dedupe, and I don't need to do any math to > get the "raw" storage figure, then it doesn't matter... > > > >>>> Did turning on dedupe for a single filesystem turn >> it >>> on for the entire pool ? >>> >>> In a sense, yes. The dedup machinery is pool-wide, but >> only >>> writes from >>> filesystems which have dedup enabled enter it. The >> rest >>> simply pass it >>> by and work as usual. >> >> >> Ok - but from a performance point of view, I am only using >> ram/cpu resources for the deduping of just the individual >> filesystems I enabled dedupe on, right ? I hope that >> turning on dedupe for just one filesystem did not incur >> ram/cpu costs across the entire pool... > > > I also wonder about this performance question... It depends. -- richard -- illumos Day & ZFS Day, Oct 1-2, 2012 San Fransisco www.zfsday.com richard.ell...@richardelling.com +1-760-896-4422
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss