Citing yourself:
"The average block size for a given data block should be used as the metric to map all other datablock sizes to. For example, the ZFS recordsize is 128kb by default. If the average block (or page) size of a directory server is 2k, then the mismatch in size will result in degraded throughput for both read and write operations. One of the benefits of ZFS is that you can change the recordsize of all write operations from the time you set the new value going forward. " And the above is not even entirely correct as if a file is bigger than a current value of recordsize property reducing a recordsize won't change block size for the file (it will continue to use the previous size, for example 128K). This is why you need to set recordsize to a desired value for large files *before* you create them (or you will have to copy them later on). >From the performance point of view it really depends on a workload but as you described in your blog the default recordsize of 128K with an average write/read of 2K for many workloads will negatively impact performance, and lowering recordsize can potentially improve it. Nevertheless I was referring to dedup efficiency which with lower recordsize values should improve dedup ratios (although it will require more memory for ddt). From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Brad Diggs Sent: 29 December 2011 15:55 To: Robert Milkowski Cc: 'zfs-discuss discussion list' Subject: Re: [zfs-discuss] Improving L1ARC cache efficiency with dedup Reducing the record size would negatively impact performance. For rational why, see the section titled "Match Average I/O Block Sizes" in my blog post on filesystem caching: http://www.thezonemanager.com/2009/03/filesystem-cache-optimization.html Brad Brad Diggs | Principal Sales Consultant | 972.814.3698 eMail: brad.di...@oracle.com Tech Blog: <http://TheZoneManager.com/> http://TheZoneManager.com LinkedIn: http://www.linkedin.com/in/braddiggs On Dec 29, 2011, at 8:08 AM, Robert Milkowski wrote: Try reducing recordsize to 8K or even less *before* you put any data. This can potentially improve your dedup ratio and keep it higher after you start modifying data. From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Brad Diggs Sent: 28 December 2011 21:15 To: zfs-discuss discussion list Subject: Re: [zfs-discuss] Improving L1ARC cache efficiency with dedup As promised, here are the findings from my testing. I created 6 directory server instances where the first instance has roughly 8.5GB of data. Then I initialized the remaining 5 instances from a binary backup of the first instance. Then, I rebooted the server to start off with an empty ZFS cache. The following table shows the increased L1ARC size, increased search rate performance, and increase CPU% busy with each starting and applying load to each successive directory server instance. The L1ARC cache grew a little bit with each additional instance but largely stayed the same size. Likewise, the ZFS dedup ratio remained the same because no data on the directory server instances was changing. <image001.png> However, once I started modifying the data of the replicated directory server topology, the caching efficiency quickly diminished. The following table shows that the delta for each instance increased by roughly 2GB after only 300k of changes. <image002.png> I suspect the divergence in data as seen by ZFS deduplication most likely occurs because reduplication occurs at the block level rather than at the byte level. When a write is sent to one directory server instance, the exact same write is propagated to the other 5 instances and therefore should be considered a duplicate. However this was not the case. There could be other reasons for the divergence as well. The two key takeaways from this exercise were as follows. There is tremendous caching potential through the use of ZFS deduplication. However, the current block level deduplication does not benefit directory as much as it perhaps could if deduplication occurred at the byte level rather than the block level. It very could be that even byte level deduplication doesn't work as well either. Until that option is available, we won't know for sure. Regards, Brad <image003.png> Brad Diggs | Principal Sales Consultant Tech Blog: <http://TheZoneManager.com/> http://TheZoneManager.com LinkedIn: http://www.linkedin.com/in/braddiggs On Dec 12, 2011, at 10:05 AM, Brad Diggs wrote: Thanks everyone for your input on this thread. It sounds like there is sufficient weight behind the affirmative that I will include this methodology into my performance analysis test plan. If the performance goes well, I will share some of the results when we conclude in January/February timeframe. Regarding the great dd use case provided earlier in this thread, the L1 and L2 ARC detect and prevent streaming reads such as from dd from populating the cache. See my previous blog post at the web site link below for a way around this protective caching control of ZFS. http://www.thezonemanager.com/2010/02/directory-data-priming-strategies.html Thanks again! Brad <PastedGraphic-2.tiff> Brad Diggs | Principal Sales Consultant Tech Blog: <http://TheZoneManager.com/> http://TheZoneManager.com LinkedIn: http://www.linkedin.com/in/braddiggs On Dec 8, 2011, at 4:22 PM, Mark Musante wrote: You can see the original ARC case here: http://arc.opensolaris.org/caselog/PSARC/2009/557/20091013_lori.alt On 8 Dec 2011, at 16:41, Ian Collins wrote: On 12/ 9/11 12:39 AM, Darren J Moffat wrote: On 12/07/11 20:48, Mertol Ozyoney wrote: Unfortunetly the answer is no. Neither l1 nor l2 cache is dedup aware. The only vendor i know that can do this is Netapp In fact , most of our functions, like replication is not dedup aware. For example, thecnicaly it's possible to optimize our replication that it does not send daya chunks if a data chunk with the same chechsum exists in target, without enabling dedup on target and source. We already do that with 'zfs send -D': -D Perform dedup processing on the stream. Deduplicated streams cannot be received on systems that do not support the stream deduplication feature. Is there any more published information on how this feature works? -- Ian. _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
<<image001.png>>
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss