Thanks for you response, Richard. On Fri, Dec 30, 2011 at 09:52:17AM -0800, Richard Elling wrote: > On Dec 29, 2011, at 10:31 PM, Ray Van Dolson wrote: > > > Hi all; > > > > We have a dev box running NexentaStor Community Edition 3.1.1 w/ 24GB > > (we don't run dedupe on production boxes -- and we do pay for Nexenta > > licenses on prd as well) RAM and an 8.5TB pool with deduplication > > enabled (1.9TB or so in use). Dedupe ratio is only 1.26x. > > Yes, this workload is a poor fit for dedup. > > > The box has an SLC-based SSD as ZIL and a 300GB MLC SSD as L2ARC. > > > > The box has been performing fairly poorly lately, and we're thinking > > it's due to deduplication: > > > > # echo "::arc" | mdb -k | grep arc_meta > > arc_meta_used = 5884 MB > > arc_meta_limit = 5885 MB > > This can be tuned. Since you are on the community edition and thus have no > expectation of support, you can increase this limit yourself. In the future, > the > limit will be increased OOB. For now, add something like the following to the > /etc/system file and reboot. > > *** Parameter: zfs:zfs_arc_meta_limit > ** Description: sets the maximum size of metadata stored in the ARC. > ** Metadata competes with real data for ARC space. > ** Release affected: NexentaStor 3.0, 3.1, not needed for 4.0 > ** Validation: none > ** When to change: for metadata-intensive or deduplication workloads > ** having more metadata in the ARC can improve performance. > ** Stability: NexentaStor issue #7151 seeks to change the default > ** value to be larger than 1/4 of arc_max. > ** Data type: integer > ** Default: 1/4 of arc_max (bytes) > ** Range: 10000 to arc_max > ** Changed by: YOUR_NAME_HERE > ** Change date: TODAYS_DATE > ** > *set zfs:zfs_arc_meta_limit = 10000000
If we wanted to this on a running system, would the following work? # echo "arc_meta_limit/Z 0x271000000" | mdb -kw (To up arc_meta_limit to 10GB) > > > > arc_meta_max = 5888 MB > > > > # zpool status -D > > ... > > DDT entries 24529444, size 331 on disk, 185 in core > > > > So, not only are we using up all of our metadata cache, but the DDT > > table is taking up a pretty significant chunk of that (over 70%). > > > > ARC sizing is as follows: > > > > p = 15331 MB > > c = 16354 MB > > c_min = 2942 MB > > c_max = 23542 MB > > size = 16353 MB > > > > I'm not really sure how to determine how many blocks are on this zpool > > (is it the same as the # of DDT entries? -- deduplication has been on > > since pool creation). If I use a 64KB block size average, I get about > > 31 million blocks, but DDT entries are 24 million …. > > The zpool status -D output shows the number of blocks. > > > zdb -DD and zdb -bb | grep 'bp count" both do not complete (zdb says > > I/O error). Probably because the pool is in use and is quite busy. > > Yes, zdb is not expected to produce correct output for imported pools. > > > Without the block count I'm having a hard time determining how much > > memory we _should_ have. I can only speculate that it's "more" at this > > point. :) > > > > If I assume 24 million blocks is about accurate (from zpool status -D > > output above), then at 320 bytes per block we're looking at about 7.1GB > > for DDT table size. > > That is the on-disk calculation. Use the in-core number for memory > consumption. > RAM needed if DDT is completely in ARC = 4,537,947,140 bytes (+) > > > We do have L2ARC, though I'm not sure how ZFS > > decides what portion of the DDT stays in memory and what can go to > > L2ARC -- if all of it went to L2ARC, then the references to this > > information in arc_meta would be (at 176 bytes * 24million blocks) > > around 4GB -- which again is a good chuck of arc_meta_max. > > Some of the data might already be in L2ARC. But L2ARC access is always > slower than RAM access by a few orders of magnitude. > > > Given that our dedupe ratio on this pool is fairly low anyways, am > > looking for strategies to back out. Should we just disable > > deduplication and then maybe bump up the size of the arc_meta_max? > > Maybe also increase the size of arc.size as well (8GB left for the > > system seems higher than we need)? > > The arc_size is dynamic, but limited by another bug in Solaris to effectively > 7/8 > of RAM (fixed in illumos). Since you are unsupported, you can try to add the > following to /etc/system along with the tunable above. > > *** Parameter: swapfs_minfree > ** Description: sets the minimum space reserved for the rest of the > ** system as swapfs grows. This value is also used to calculate the > ** dynamic upper limit of the ARC size. > ** Release affected: NexentaStor 3.0, 3.1, not needed for 4.0 > ** Validation: none > ** When to change: the default setting of physmem/8 caps the ARC to > ** approximately 7/8 of physmem, a value usually much smaller than > ** arc_max. Choosing a lower limit for swapfs_minfree can allow the > ** ARC to grow above 7/8 of physmem. > ** Data type: unsigned integer (pages) > ** Default: 1/8 of physmem > ** Range: clamped at 256MB (65,536 4KB pages) for NexentaStor 4.0 > ** Changed by: YOUR_NAME_HERE > ** Change date: TODAYS_DATE > ** > *set swapfs_minfree=65536 > > > > > Is there a non-disruptive way to undeduplicate everything and expunge > > the DDT? > > define "disruptive" > > > zfs send/recv and then back perhaps (we have the extra > > space)? > > send/receive is the most cost-effective way. > -- richard I think we will give this method a shot. Thanks, Ray _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss