On Fri, Jun 04, 2010 at 01:10:44PM -0700, Ray Van Dolson wrote: > On Fri, Jun 04, 2010 at 01:03:32PM -0700, Brandon High wrote: > > On Fri, Jun 4, 2010 at 12:37 PM, Ray Van Dolson <rvandol...@esri.com> wrote: > > > Makes sense. So, as someone else suggested, decreasing my block size > > > may improve the deduplication ratio. > > > > It might. It might make your performance tank, too. > > > > Decreasing the block size increases the size of the dedup table (DDT). > > Every entry in the DDT uses somewhere around 250-270 bytes. If the DDT > > gets too large to fit in memory, it will have to be read from disk, > > which will destroy any sort of write performance (although a L2ARC on > > SSD can help) > > > > If you move to 64k blocks, you'll double the DDT size and may not > > actually increase your ratio. Moving to 8k blocks will increase your > > DDT by a factor of 16, and still may not help. > > > > Changing the recordsize will not affect files that are already in the > > dataset. You'll have to recopy them to re-write with the smaller block > > size. > > > > -B > > Gotcha. Just trying to make sure I understand how all this works, and > if I _would_ in fact see an improvement in dedupe-ratio by tweaking the > recordsize with our data-set. > > Once we know that we can decide if it's worth the extra costs in > RAM/L2ARC. > > Thanks all.
FYI; With 4K recordsize, I am seeing 1.26x dedupe ratio between the RHEL 5.4 ISO and the RHEL 5.5 ISO file. However, it took about 33 minutes to copy the 2.9GB ISO file onto the filesystem. :) Definitely would need more RAM in this setup... Ray _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss