Re: [zfs-discuss] Improving L1ARC cache efficiency with dedup

Robert Milkowski Thu, 29 Dec 2011 09:21:57 -0800

 

Citing yourself:


 

"The average block size for a given data block should be used as the metric
to map all other datablock sizes to. For example, the ZFS recordsize is
128kb by default. If the average block (or page) size of a directory server
is 2k, then the mismatch in size will result in degraded throughput for both
read and write operations. One of the benefits of ZFS is that you can change
the recordsize of all write operations from the time you set the new value
going forward.

"

 

And the above is not even entirely correct as if a file is bigger than a
current value of recordsize property reducing a recordsize won't change
block size for the file (it will continue to use the previous size, for
example 128K). This is why you need to set recordsize to a desired value for
large files *before* you create them (or you will have to copy them later
on).

 

>From the performance point of view it really depends on a workload but as
you described in your blog the default recordsize of 128K with an average
write/read of 2K for many workloads will negatively impact performance, and
lowering recordsize can potentially improve it.

 

Nevertheless I was referring to dedup efficiency which with lower recordsize
values should improve dedup ratios (although it will require more memory for
ddt).

 

 

 

From: zfs-discuss-boun...@opensolaris.org
[mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Brad Diggs
Sent: 29 December 2011 15:55
To: Robert Milkowski
Cc: 'zfs-discuss discussion list'
Subject: Re: [zfs-discuss] Improving L1ARC cache efficiency with dedup

 

Reducing the record size would negatively impact performance.  For rational
why, see the

section titled "Match Average I/O Block Sizes" in my blog post on filesystem
caching:

http://www.thezonemanager.com/2009/03/filesystem-cache-optimization.html

 

Brad


Brad Diggs | Principal Sales Consultant | 972.814.3698

eMail: brad.di...@oracle.com

Tech Blog:  <http://TheZoneManager.com/> http://TheZoneManager.com

LinkedIn: http://www.linkedin.com/in/braddiggs

 

On Dec 29, 2011, at 8:08 AM, Robert Milkowski wrote:





 

Try reducing recordsize to 8K or even less *before* you put any data.

This can potentially improve your dedup ratio and keep it higher after you
start modifying data.

 

 

From: zfs-discuss-boun...@opensolaris.org
[mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Brad Diggs
Sent: 28 December 2011 21:15
To: zfs-discuss discussion list
Subject: Re: [zfs-discuss] Improving L1ARC cache efficiency with dedup

 

As promised, here are the findings from my testing.  I created 6 directory
server instances where the first

instance has roughly 8.5GB of data.  Then I initialized the remaining 5
instances from a binary backup of

the first instance.  Then, I rebooted the server to start off with an empty
ZFS cache.  The following table

shows the increased L1ARC size, increased search rate performance, and
increase CPU% busy with

each starting and applying load to each successive directory server
instance.  The L1ARC cache grew

a little bit with each additional instance but largely stayed the same size.
Likewise, the ZFS dedup ratio

remained the same because no data on the directory server instances was
changing.

 

<image001.png>

However, once I started modifying the data of the replicated directory
server topology, the caching efficiency 

quickly diminished.  The following table shows that the delta for each
instance increased by roughly 2GB 

after only 300k of changes.

 

<image002.png>

I suspect the divergence in data as seen by ZFS deduplication most likely
occurs because reduplication 

occurs at the block level rather than at the byte level.  When a write is
sent to one directory server instance, 

the exact same write is propagated to the other 5 instances and therefore
should be considered a duplicate.  

However this was not the case.  There could be other reasons for the
divergence as well.

 

The two key takeaways from this exercise were as follows.  There is
tremendous caching potential

through the use of ZFS deduplication.  However, the current block level
deduplication does not 

benefit directory as much as it perhaps could if deduplication occurred at
the byte level rather than

the block level.  It very could be that even byte level deduplication
doesn't work as well either.  

Until that option is available, we won't know for sure.

 

Regards,

 

Brad

<image003.png>

Brad Diggs | Principal Sales Consultant

Tech Blog:  <http://TheZoneManager.com/> http://TheZoneManager.com

LinkedIn: http://www.linkedin.com/in/braddiggs

 

On Dec 12, 2011, at 10:05 AM, Brad Diggs wrote:






Thanks everyone for your input on this thread.  It sounds like there is
sufficient weight

behind the affirmative that I will include this methodology into my
performance analysis

test plan.  If the performance goes well, I will share some of the results
when we conclude

in January/February timeframe.

 

Regarding the great dd use case provided earlier in this thread, the L1 and
L2 ARC 

detect and prevent streaming reads such as from dd from populating the
cache.  See

my previous blog post at the web site link below for a way around this
protective

caching control of ZFS.

 

http://www.thezonemanager.com/2010/02/directory-data-priming-strategies.html

 

Thanks again!

 

Brad

<PastedGraphic-2.tiff>

Brad Diggs | Principal Sales Consultant

Tech Blog:  <http://TheZoneManager.com/> http://TheZoneManager.com

LinkedIn: http://www.linkedin.com/in/braddiggs

 

On Dec 8, 2011, at 4:22 PM, Mark Musante wrote:







You can see the original ARC case here:

http://arc.opensolaris.org/caselog/PSARC/2009/557/20091013_lori.alt

On 8 Dec 2011, at 16:41, Ian Collins wrote:





On 12/ 9/11 12:39 AM, Darren J Moffat wrote:

On 12/07/11 20:48, Mertol Ozyoney wrote:

Unfortunetly the answer is no. Neither l1 nor l2 cache is dedup aware.

 

The only vendor i know that can do this is Netapp

 

In fact , most of our functions, like replication is not dedup aware.

For example, thecnicaly it's possible to optimize our replication that

it does not send daya chunks if a data chunk with the same chechsum

exists in target, without enabling dedup on target and source.

We already do that with 'zfs send -D':

 

         -D

 

             Perform dedup processing on the stream. Deduplicated

             streams  cannot  be  received on systems that do not

             support the stream deduplication feature.

 

 

 

 

Is there any more published information on how this feature works?

 

--

Ian.

 

_______________________________________________

zfs-discuss mailing list

zfs-discuss@opensolaris.org

http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

 

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

<<image001.png>>

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Improving L1ARC cache efficiency with dedup

Reply via email to