On Fri, 7 Jul 2006, Darren J Moffat wrote:

Eric Schrock wrote:
On Thu, Jul 06, 2006 at 09:53:32PM +0530, Pramod Batni wrote:
   offtopic query :
   How can ZFS require more VM address space but not more VM ?


The real problem is VA fragmentation, not consumption.  Over time, ZFS's
heavy use of the VM system causes the address space to become
fragmented.  Eventually, we will need to grab a 128k block of contiguous
VA, but can't find a contiguous region, despite having plenty of memory
(physical or virtual).

Interesting, I saw and helped debug a very similar sounding problem with VxVM and VxFS on an E10k with 15TB of EMC storage and 10,000 NFS shares years ago. This was on Solaris 2.6 so even though it was UltraSPARC CPU there was still only a 32bit address space.

Jeff Bonwick supplied the fixes for this, I don't remember the details but it did help reduce the memory fragmentation. It does make me wonder though if these fixes that were applicable to 32bit SPARC work for 32bit x86.

Not quite comparable. The work that Jeff did then was the conversion of the old rmalloc-based heap mgmt. to vmem. The problem with the old allocator was that _any_ oversize allocation activity, even if it were a growth request from a kmem cache, lead to heavy heap fragmentation, and the number of fragments in an rmalloc-based mechanism (see rmalloc(9F)) is limited. Vmem scales here, and the quantum caches (which is the part that got backported to 2.6) as an intermediate "band aid" also significantly reduce the number of calls into the heap allocator backend.

vmem allows the heap to fragment - and still to function - which is a striking difference to rmalloc. Once the (determined at map creation time) number of slots in a resource map is reached, it doesn't matter whether there'd be free mem in the heap, you can't get at it unless you happen to request _exactly_ the size of an existing fragment. Otherwise, you'd need to split a fragment, creating two/three new ones, which you can't as there is no slot - Fragmentation with the pre-8 rmalloc heap is pathological. It's not with vmem, vmem allows the heap to work even if heavily fragmented. But if you have a heavy "oversize consumer", the long-term effect of that will be that all vmem arenas larger than the "most frequently used 'big' size" become empty. ZFS will make all free spans accumulate in the 128kB one under high load.

Ok, all that babbling in short: In Solaris 2.6, heap fragmentation was a
pathological scaling problem that lead to a system hang sooner or later because of kernelmap exhaustion. The Vmem/quantum cache heap does function even if the heap gets very fragmented - it scales. It doesn't remove the possibility of the heap to fragment, but it deals with that gracefully. What still is there, though, is the ability of a kernel memory consumer to cause heap fragmentation - vmem can't solve the issue that if you allocate and free a huge number of N-sized slabs in random ways over time, the heap will in the end contain mostly N-sized fragments. That's what happens with ZFS.

FrankH.

==========================================================================
No good can come from selling your freedom, not for all gold of the world,
for the value of this heavenly gift exceeds that of any fortune on earth.
==========================================================================
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to