Re: [zfs-discuss] x86 CPU Choice for ZFS

Frank Hofmann Fri, 07 Jul 2006 03:06:05 -0700

On Fri, 7 Jul 2006, Darren J Moffat wrote:

Eric Schrock wrote:
On Thu, Jul 06, 2006 at 09:53:32PM +0530, Pramod Batni wrote:
   offtopic query :
   How can ZFS require more VM address space but not more VM ?
The real problem is VA fragmentation, not consumption.  Over time, ZFS's
heavy use of the VM system causes the address space to become
fragmented.  Eventually, we will need to grab a 128k block of contiguous
VA, but can't find a contiguous region, despite having plenty of memory
(physical or virtual).
Interesting, I saw and helped debug a very similar sounding problem withVxVM and VxFS on an E10k with 15TB of EMC storage and 10,000 NFS shares yearsago. This was on Solaris 2.6 so even though it was UltraSPARC CPU there wasstill only a 32bit address space.
Jeff Bonwick supplied the fixes for this, I don't remember the details but itdid help reduce the memory fragmentation. It does make me wonder though ifthese fixes that were applicable to 32bit SPARC work for 32bit x86.

Not quite comparable. The work that Jeff did then was the conversion ofthe old rmalloc-based heap mgmt. to vmem. The problem with the oldallocator was that _any_ oversize allocation activity, even if it were agrowth request from a kmem cache, lead to heavy heap fragmentation, andthe number of fragments in an rmalloc-based mechanism (see rmalloc(9F)) islimited. Vmem scales here, and the quantum caches (which is the part thatgot backported to 2.6) as an intermediate "band aid" also significantlyreduce the number of calls into the heap allocator backend.

vmem allows the heap to fragment - and still to function - which is astriking difference to rmalloc. Once the (determined at map creation time)number of slots in a resource map is reached, it doesn't matter whetherthere'd be free mem in the heap, you can't get at it unless you happen torequest _exactly_ the size of an existing fragment. Otherwise, you'd needto split a fragment, creating two/three new ones, which you can't as thereis no slot - Fragmentation with the pre-8 rmalloc heap is pathological.It's not with vmem, vmem allows the heap to work even if heavilyfragmented. But if you have a heavy "oversize consumer", the long-termeffect of that will be that all vmem arenas larger than the "mostfrequently used 'big' size" become empty. ZFS will make all free spansaccumulate in the 128kB one under high load.


Ok, all that babbling in short: In Solaris 2.6, heap fragmentation was a

pathological scaling problem that lead to a system hang sooner or laterbecause of kernelmap exhaustion. The Vmem/quantum cache heap does functioneven if the heap gets very fragmented - it scales. It doesn't remove thepossibility of the heap to fragment, but it deals with that gracefully.What still is there, though, is the ability of a kernel memory consumer tocause heap fragmentation - vmem can't solve the issue that if you allocateand free a huge number of N-sized slabs in random ways over time, the heapwill in the end contain mostly N-sized fragments. That's what happens withZFS.


FrankH.

==========================================================================
No good can come from selling your freedom, not for all gold of the world,
for the value of this heavenly gift exceeds that of any fortune on earth.
==========================================================================
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] x86 CPU Choice for ZFS

Reply via email to