Re: [zfs-discuss] Summary: Dedup and L2ARC memory requirements

Richard Elling Thu, 05 May 2011 20:45:46 -0700

On May 4, 2011, at 7:56 PM, Edward Ned Harvey wrote:

> This is a summary of a much longer discussion "Dedup and L2ARC memory
> requirements (again)"
> Sorry even this summary is long.  But the results vary enormously based on
> individual usage, so any "rule of thumb" metric that has been bouncing
> around on the internet is simply not sufficient.  You need to go into this
> level of detail to get an estimate that's worth the napkin or bathroom
> tissue it's scribbled on.
> 
> This is how to (reasonably) accurately estimate the hypothetical ram
> requirements to hold the complete data deduplication tables (DDT) and L2ARC
> references in ram.  Please note both the DDT and L2ARC references can be
> evicted from memory according to system policy, whenever the system decides
> some other data is more valuable to keep.  So following this guide does not
> guarantee that the whole DDT will remain in ARC or L2ARC.  But it's a good
> start.


As the size of the data grows, the need to have the whole DDT in RAM or L2ARC
decreases. With one notable exception, destroying a dataset or snapshot requires
the DDT entries for the destroyed blocks to be updated. This is why people can
go for months or years and not see a problem, until they try to destroy a 
dataset.

> 
> I am using a solaris 11 express x86 test system for my example numbers
> below.  
> 
> ----------- To calculate size of DDT -----------
> 
> Each entry in the DDT is a fixed size, which varies by platform.  You can
> find it with the command:
>       echo ::sizeof ddt_entry_t | mdb -k
> This will return a hex value, that you probably want to convert to decimal.
> On my test system, it is 0x178 which is 376 bytes
> 
> There is one DDT entry per non-dedup'd (unique) block in the zpool.

The workloads which are nicely dedupable tend to not have unique blocks.
So this is another way of saying, "if your workload isn't dedupable, don't 
bother
with deduplication." For years now we have been trying to convey this message.
One way to help convey the message is...

>  Be
> aware that you cannot reliably estimate #blocks by counting #files.  You can
> find the number of total blocks including dedup'd blocks in your pool with
> this command:
>       zdb -bb poolname | grep 'bp count'

Ugh. A better method is to simulate dedup on existing data:
        zdb -S poolname
or measure dedup efficacy
        zdb -DD poolname
which offer similar tabular analysis

> Note:  This command will run a long time and is IO intensive.  On my systems
> where a scrub runs for 8-9 hours, this zdb command ran for about 90 minutes.
> On my test system, the result is 44145049 (44.1M) total blocks.
> 
> To estimate the number of non-dedup'd (unique) blocks (assuming average size
> of dedup'd blocks = average size of blocks in the whole pool), use:
>       zpool list
> Find the dedup ratio.  In my test system, it is 2.24x.  Divide the total
> blocks by the dedup ratio to find the number of non-dedup'd (unique) blocks.

Or just count the unique and non-unique blocks with:
        zdb -D poolname

> 
> In my test system:
>       44145049 total blocks / 2.24 dedup ratio = 19707611 (19.7M) approx
> non-dedup'd (unique) blocks
> 
> Then multiply by the size of a DDT entry.
>       19707611 * 376 = 7410061796 bytes = 7G total DDT size

A minor gripe about zdb -D output is that it doesn't do the math.

> 
> ----------- To calculate size of ARC/L2ARC references -----------
> 
> Each reference to a L2ARC entry requires an entry in ARC (ram).  This is
> another fixed size, which varies by platform.  You can find it with the
> command:
>       echo ::sizeof arc_buf_hdr_t | mdb -k
> On my test system, it is 0xb0 which is 176 bytes

Better yet, without need for mdb privilege, measure the current L2ARC header
size in use. Normal user accounts can:
        kstat -p zfs::arcstats:hdr_size
        kstat -p zfs::arcstats:l2_hdr_size

arcstat will allow you to easily track this over time.

> 
> We need to know the average block size in the pool, to estimate the number
> of blocks that will fit into L2ARC.  Find the amount of space ALLOC in the
> pool:
>       zpool list
> Divide by the number of non-dedup'd (unique) blocks in the pool, to find the
> average block size.  In my test system:
>       790G / 19707611 = 42K average block size
> 
> Remember:  If your L2ARC were only caching average size blocks, then the
> payload ratio of L2ARC vs ARC would be excellent.  In my test system, every
> 42K L2ARC would require 176bytes ARC (a ratio of 244x).  This would result
> in a negligible ARC memory consumption.  But since your DDT can be pushed
> out of ARC into L2ARC, you get a really bad ratio of L2ARC vs ARC memory
> consumption.  In my test system every 376bytes DDT entry in L2ARC consumes
> 176bytes ARC (a ratio of 2.1x).  Yes, it is approximately possible to have
> the complete DDT present in ARC and L2ARC, thus consuming tons of ram.

This is a good thing for those cases when you need to quickly reference large
numbers of DDT entries.

> 
> Remember disk mfgrs use base-10.  So my 32G SSD is only 30G base-2.
> (32,000,000,000 / 1024/1024/1024)
> 
> So I have 30G L2ARC, and the first 7G may be consumed by DDT.  This leaves
> 23G remaining to be used for average-sized blocks.
> The ARC consumed to reference the DDT in L2ARC is 176/376 * DDT size. In my
> test system this is 176/376 * 7G = 3.3G
> 
> Take the remaining size of your L2ARC, divide by average block size, to get
> the number of average size blocks the L2ARC can hold.  In my test system:
>       23G / 42K = 574220 average-size blocks in L2ARC
> Multiply by the ARC size of a L2ARC reference.  On my test system:
>       574220 * 176 = 101062753 bytes = 96MB ARC consumed to reference the
> average-size blocks in L2ARC
> 
> So the total ARC consumption to hold L2ARC references in my test system is
> 3.3G + 96M ~= 3.4G
> 
> ----------- To calculate total ram needed -----------
> 
> And finally - The max size the ARC is allowed to grow, is a constant that
> varies by platform.  On my system, it is 80% of system ram.  

It is surely not 80% of RAM unless you have 5GB RAM or by luck. The algorithm 
for 
c_max is well documented as starting with the larger of:
        7/8 of physmem
or
        physmem - 1GB

This value adjusts as memory demands from other processes are satisfied.

> You can find
> this value using the command:
>       kstat -p zfs::arcstats:c_max
> Divide by your total system memory to find the ratio.
> Assuming the ratio is 4/5, it means you need to buy 5/4 the amount of
> calculated ram to satisfy all your requirements.
> 
> So the end result is:
> On my test system I guess the OS and processes consume 1G.  (I'm making that
> up without any reason.)

This is a little bit trickier to undestand, which is why we have:
        echo ::memstat | mdb -k

> On my test system I guess I need 8G in the system to get reasonable
> performance without dedup or L2ARC.  (Again, I'm just making that up.)
> We calculated that I need 7G for DDT and 3.4G for L2ARC.  That is 10.4G.
> Multiply by 5/4 and it means I need 13G
> My system needs to be built with at least 8G + 13G = 21G.
> Of this, 20% (4.2G) is more than enough to run the OS and processes, while
> 80% (16.8G) is available for ARC.  Of the 16.8G ARC, the DDT and L2ARC
> references will consume 10.4G, which leaves 6.4G for "normal" ARC caching.
> These numbers are all fuzzy.  Anything from 16G to 24G might be reasonable.
> 
> That's it. I'm done.
> 
> P.S.  I'll just throw this out there:  It is my personal opinion that you
> probably won't have the whole DDT in ARC and L2ARC at the same time.

yep, that is a safe bet.

> Because the L2ARC is populated from the soon-to-expire list of the ARC, it
> seems unlikely that all the DDT entries will get into ARC, and then onto the
> soon-to-expire list and then pulled back into ARC and stay there.  The above
> calculation is a sort of worst case.  I think the following is likely to be
> a more realistic actual case:
> 
> Personally, I would model the ARC memory consumption of the L2ARC entries
> using the average block size of the data pool, and just neglect the DDT
> entries in the L2ARC.  Well ... inflate some.  Say 10% of the DDT is in the
> L2ARC and the ARC at the same time.  I'm making up this number from thin
> air.

Much better to just measure it. However, measurements are likely to not be
appropriate for capacity planning purposes :-(

> My revised end result is:
> On my test system I guess the OS and processes consume 1G.  (I'm making that
> up without any reason.)
> On my test system I guess I need 8G in the system to get reasonable
> performance without dedup or L2ARC.  (Again, I'm just making that up.)
> We calculated that I need 7G for DDT and (96M + 10% of 3.3G = 430M) for
> L2ARC.  Multiply by 5/4 and it means I need 7.5G * 1.25 = 9.4G 
> My system needs to be built with at least 8G + 9.4G = 17.4G.  
> Of this, 20% (3.5G) is more than enough to run the OS and processes, while
> 80% (13.9G) is available for ARC.  Of the 13.9G ARC, the DDT and L2ARC
> references will consume 7.5G, which leaves 6.4G for "normal" ARC caching.
> I personally think that's likely to be more accurate in the observable
> world.
> My revised end result is still basically the same:  These numbers are all
> fuzzy.  Anything from 16G to 24G might be reasonable.

I think these RAM numbers are reasonable first guesses. Many of the systems I've
seen deployed this year are 48 to 96 GB RAM. L2ARC devices are 250 to 600 GB.
 -- richard

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Summary: Dedup and L2ARC memory requirements

Reply via email to