Re: [zfs-discuss] Resolving performance issue w/ deduplication (NexentaStor)

Richard Elling Fri, 30 Dec 2011 09:53:42 -0800

On Dec 29, 2011, at 10:31 PM, Ray Van Dolson wrote:

> Hi all;
> 
> We have a dev box running NexentaStor Community Edition 3.1.1 w/ 24GB
> (we don't run dedupe on production boxes -- and we do pay for Nexenta
> licenses on prd as well) RAM and an 8.5TB pool with deduplication
> enabled (1.9TB or so in use).  Dedupe ratio is only 1.26x.


Yes, this workload is a poor fit for dedup.

> The box has an SLC-based SSD as ZIL and a 300GB MLC SSD as L2ARC.
> 
> The box has been performing fairly poorly lately, and we're thinking
> it's due to deduplication:
> 
>  # echo "::arc" | mdb -k | grep arc_meta
>  arc_meta_used             =      5884 MB
>  arc_meta_limit            =      5885 MB

This can be tuned. Since you are on the community edition and thus have no 
expectation of support, you can increase this limit yourself. In the future, the
limit will be increased OOB. For now, add something like the following to the
/etc/system file and reboot.

*** Parameter: zfs:zfs_arc_meta_limit
** Description: sets the maximum size of metadata stored in the ARC.
**   Metadata competes with real data for ARC space.
** Release affected: NexentaStor 3.0, 3.1, not needed for 4.0
** Validation: none
** When to change: for metadata-intensive or deduplication workloads
**   having more metadata in the ARC can improve performance.
** Stability: NexentaStor issue #7151 seeks to change the default 
**   value to be larger than 1/4 of arc_max.
** Data type: integer
** Default: 1/4 of arc_max (bytes)
** Range: 10000 to arc_max
** Changed by: YOUR_NAME_HERE
** Change date: TODAYS_DATE
**
*set zfs:zfs_arc_meta_limit = 10000000


>  arc_meta_max              =      5888 MB
> 
>  # zpool status -D
>  ...
>  DDT entries 24529444, size 331 on disk, 185 in core
> 
> So, not only are we using up all of our metadata cache, but the DDT
> table is taking up a pretty significant chunk of that (over 70%).
> 
> ARC sizing is as follows:
> 
>  p                         =     15331 MB
>  c                         =     16354 MB
>  c_min                     =      2942 MB
>  c_max                     =     23542 MB
>  size                      =     16353 MB
> 
> I'm not really sure how to determine how many blocks are on this zpool
> (is it the same as the # of DDT entries? -- deduplication has been on
> since pool creation).  If I use a 64KB block size average, I get about
> 31 million blocks, but DDT entries are 24 million ….

The zpool status -D output shows the number of blocks.

> zdb -DD and zdb -bb | grep 'bp count" both do not complete (zdb says
> I/O error).  Probably because the pool is in use and is quite busy.

Yes, zdb is not expected to produce correct output for imported pools.

> Without the block count I'm having a hard time determining how much
> memory we _should_ have.  I can only speculate that it's "more" at this
> point. :)
> 
> If I assume 24 million blocks is about accurate (from zpool status -D
> output above), then at 320 bytes per block we're looking at about 7.1GB
> for DDT table size.  

That is the on-disk calculation. Use the in-core number for memory consumption.
        RAM needed if DDT is completely in ARC = 4,537,947,140 bytes (+)

> We do have L2ARC, though I'm not sure how ZFS
> decides what portion of the DDT stays in memory and what can go to
> L2ARC -- if all of it went to L2ARC, then the references to this
> information in arc_meta would be (at 176 bytes * 24million blocks)
> around 4GB -- which again is a good chuck of arc_meta_max.

Some of the data might already be in L2ARC. But L2ARC access is always
slower than RAM access by a few orders of magnitude.

> Given that our dedupe ratio on this pool is fairly low anyways, am
> looking for strategies to back out.  Should we just disable
> deduplication and then maybe bump up the size of the arc_meta_max?
> Maybe also increase the size of arc.size as well (8GB left for the
> system seems higher than we need)?

The arc_size is dynamic, but limited by another bug in Solaris to effectively 
7/8
of RAM (fixed in illumos). Since you are unsupported, you can try to add the
following to /etc/system along with the tunable above.

*** Parameter: swapfs_minfree
** Description: sets the minimum space reserved for the rest of the
**   system as swapfs grows. This value is also used to calculate the
**   dynamic upper limit of the ARC size.
** Release affected: NexentaStor 3.0, 3.1, not needed for 4.0
** Validation: none
** When to change: the default setting of physmem/8 caps the ARC to
**   approximately 7/8 of physmem, a value usually much smaller than
**   arc_max. Choosing a lower limit for swapfs_minfree can allow the
**   ARC to grow above 7/8 of physmem.
** Data type: unsigned integer (pages)
** Default: 1/8 of physmem
** Range: clamped at 256MB (65,536 4KB pages) for NexentaStor 4.0
** Changed by: YOUR_NAME_HERE
** Change date: TODAYS_DATE
**
*set swapfs_minfree=65536

> 
> Is there a non-disruptive way to undeduplicate everything and expunge
> the DDT?

define "disruptive"

>  zfs send/recv and then back perhaps (we have the extra
> space)?

send/receive is the most cost-effective way.
 -- richard

-- 

ZFS and performance consulting
http://www.RichardElling.com
illumos meetup, Jan 10, 2012, Menlo Park, CA
http://www.meetup.com/illumos-User-Group/events/41665962/ 













_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Resolving performance issue w/ deduplication (NexentaStor)

Reply via email to