Re: [zfs-discuss] Deduplication Memory Requirements

Tim Cook Wed, 04 May 2011 17:17:26 -0700

On Wed, May 4, 2011 at 6:51 PM, Erik Trimble <erik.trim...@oracle.com>wrote:


>  On 5/4/2011 4:44 PM, Tim Cook wrote:
>
>
>
> On Wed, May 4, 2011 at 6:36 PM, Erik Trimble <erik.trim...@oracle.com>wrote:
>
>> On 5/4/2011 4:14 PM, Ray Van Dolson wrote:
>>
>>> On Wed, May 04, 2011 at 02:55:55PM -0700, Brandon High wrote:
>>>
>>>> On Wed, May 4, 2011 at 12:29 PM, Erik Trimble<erik.trim...@oracle.com>
>>>>  wrote:
>>>>
>>>>>        I suspect that NetApp does the following to limit their resource
>>>>> usage:   they presume the presence of some sort of cache that can be
>>>>> dedicated to the DDT (and, since they also control the hardware, they
>>>>> can
>>>>> make sure there is always one present).  Thus, they can make their code
>>>>>
>>>> AFAIK, NetApp has more restrictive requirements about how much data
>>>> can be dedup'd on each type of hardware.
>>>>
>>>> See page 29 of http://media.netapp.com/documents/tr-3505.pdf - Smaller
>>>> pieces of hardware can only dedup 1TB volumes, and even the big-daddy
>>>> filers will only dedup up to 16TB per volume, even if the volume size
>>>> is 32TB (the largest volume available for dedup).
>>>>
>>>> NetApp solves the problem by putting rigid constraints around the
>>>> problem, whereas ZFS lets you enable dedup for any size dataset. Both
>>>> approaches have limitations, and it sucks when you hit them.
>>>>
>>>> -B
>>>>
>>> That is very true, although worth mentioning you can have quite a few
>>> of the dedupe/SIS enabled FlexVols on even the lower-end filers (our
>>> FAS2050 has a bunch of 2TB SIS enabled FlexVols).
>>>
>>>  Stupid question - can you hit all the various SIS volumes at once, and
>> not get horrid performance penalties?
>>
>> If so, I'm almost certain NetApp is doing post-write dedup.  That way, the
>> strictly controlled max FlexVol size helps with keeping the resource limits
>> down, as it will be able to round-robin the post-write dedup to each FlexVol
>> in turn.
>>
>> ZFS's problem is that it needs ALL the resouces for EACH pool ALL the
>> time, and can't really share them well if it expects to keep performance
>> from tanking... (no pun intended)
>>
>>
>  On a 2050?  Probably not.  It's got a single-core mobile celeron CPU and
> 2GB/ram.  You couldn't even run ZFS on that box, much less ZFS+dedup.  Can
> you do it on a model that isn't 4 years old without tanking performance?
>  Absolutely.
>
>  Outside of those two 2000 series, the reason there are dedup limits isn't
> performance.
>
>  --Tim
>
>  Indirectly, yes, it's performance, since NetApp has plainly chosen
> post-write dedup as a method to restrict the required hardware
> capabilities.  The dedup limits on Volsize are almost certainly driven by
> the local RAM requirements for post-write dedup.
>
> It also looks like NetApp isn't providing for a dedicated DDT cache, which
> means that when the NetApp is doing dedup, it's consuming the normal
> filesystem cache (i.e. chewing through RAM).  Frankly, I'd be very surprised
> if you didn't see a noticeable performance hit during the period that the
> NetApp appliance is performing the dedup scans.
>
>

Again, it depends on the model/load/etc.  The smallest models will see
performance hits for sure.  If the vol size limits are strictly a matter of
ram, why exactly would they jump from 4TB to 16TB on a 3140 by simply
upgrading ONTAP?  If the limits haven't gone up on, at the very least, every
one of the x2xx systems 12 months from now, feel free to dig up the thread
and give an I-told-you-so.  I'm quite confident that won't be the case.  The
16TB limit SCREAMS to me that it's a holdover from the same 32bit limit that
causes 32-bit volumes to have a 16TB limit.  I'm quite confident they're
just taking the cautious approach on moving to 64bit dedup code.

--Tim

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Deduplication Memory Requirements

Reply via email to