On Wed, May 04, 2011 at 04:51:36PM -0700, Erik Trimble wrote: > On 5/4/2011 4:44 PM, Tim Cook wrote: > > > > On Wed, May 4, 2011 at 6:36 PM, Erik Trimble <erik.trim...@oracle.com> > wrote: > > On 5/4/2011 4:14 PM, Ray Van Dolson wrote: > > On Wed, May 04, 2011 at 02:55:55PM -0700, Brandon High wrote: > > On Wed, May 4, 2011 at 12:29 PM, Erik Trimble< > erik.trim...@oracle.com> wrote: > > I suspect that NetApp does the following to limit > their resource > usage: they presume the presence of some sort of cache > that can be > dedicated to the DDT (and, since they also control the > hardware, they can > make sure there is always one present). Thus, they can > make their code > > AFAIK, NetApp has more restrictive requirements about how much > data > can be dedup'd on each type of hardware. > > See page 29 of http://media.netapp.com/documents/tr-3505.pdf - > Smaller > pieces of hardware can only dedup 1TB volumes, and even the > big-daddy > filers will only dedup up to 16TB per volume, even if the > volume size > is 32TB (the largest volume available for dedup). > > NetApp solves the problem by putting rigid constraints around > the > problem, whereas ZFS lets you enable dedup for any size > dataset. Both > approaches have limitations, and it sucks when you hit them. > > -B > > That is very true, although worth mentioning you can have quite a > few > of the dedupe/SIS enabled FlexVols on even the lower-end filers > (our > FAS2050 has a bunch of 2TB SIS enabled FlexVols). > > > Stupid question - can you hit all the various SIS volumes at once, and > not get horrid performance penalties? > > If so, I'm almost certain NetApp is doing post-write dedup. That way, > the strictly controlled max FlexVol size helps with keeping the > resource limits down, as it will be able to round-robin the post-write > dedup to each FlexVol in turn. > > ZFS's problem is that it needs ALL the resouces for EACH pool ALL the > time, and can't really share them well if it expects to keep > performance from tanking... (no pun intended) > > > > On a 2050? Probably not. It's got a single-core mobile celeron CPU and > 2GB/ram. You couldn't even run ZFS on that box, much less ZFS+dedup. Can > you do it on a model that isn't 4 years old without tanking performance? > Absolutely. > > Outside of those two 2000 series, the reason there are dedup limits isn't > performance. > > --Tim > > > Indirectly, yes, it's performance, since NetApp has plainly chosen > post-write dedup as a method to restrict the required hardware > capabilities. The dedup limits on Volsize are almost certainly > driven by the local RAM requirements for post-write dedup. > > It also looks like NetApp isn't providing for a dedicated DDT cache, > which means that when the NetApp is doing dedup, it's consuming the > normal filesystem cache (i.e. chewing through RAM). Frankly, I'd be > very surprised if you didn't see a noticeable performance hit during > the period that the NetApp appliance is performing the dedup scans.
Yep, when the dedupe process runs, there is a drop in performance (hence we usually schedule it to run off-peak hours). Obviously this is a luxury that wouldn't be an option in every environment... During normal operations outside of the dedupe period we haven't noticed a performance hit. I don't think we hit the filer too hard however -- it's acting as a VMware datastore and only a few of the VM's have higher I/O footprints. It is a 2050C however so we spread the load across the two filer heads (although we occasionally run everything on one head when performing maintenance on the other). Ray _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss