On Fri, 7 Jul 2006, Darren J Moffat wrote:
Eric Schrock wrote:
On Thu, Jul 06, 2006 at 09:53:32PM +0530, Pramod Batni wrote:
offtopic query :
How can ZFS require more VM address space but not more VM ?
The real problem is VA fragmentation, not consumption. Over time, ZFS's
heavy use of the VM system causes the address space to become
fragmented. Eventually, we will need to grab a 128k block of contiguous
VA, but can't find a contiguous region, despite having plenty of memory
(physical or virtual).
Interesting, I saw and helped debug a very similar sounding problem with
VxVM and VxFS on an E10k with 15TB of EMC storage and 10,000 NFS shares years
ago. This was on Solaris 2.6 so even though it was UltraSPARC CPU there was
still only a 32bit address space.
Jeff Bonwick supplied the fixes for this, I don't remember the details but it
did help reduce the memory fragmentation. It does make me wonder though if
these fixes that were applicable to 32bit SPARC work for 32bit x86.
Not quite comparable. The work that Jeff did then was the conversion of
the old rmalloc-based heap mgmt. to vmem. The problem with the old
allocator was that _any_ oversize allocation activity, even if it were a
growth request from a kmem cache, lead to heavy heap fragmentation, and
the number of fragments in an rmalloc-based mechanism (see rmalloc(9F)) is
limited. Vmem scales here, and the quantum caches (which is the part that
got backported to 2.6) as an intermediate "band aid" also significantly
reduce the number of calls into the heap allocator backend.
vmem allows the heap to fragment - and still to function - which is a
striking difference to rmalloc. Once the (determined at map creation time)
number of slots in a resource map is reached, it doesn't matter whether
there'd be free mem in the heap, you can't get at it unless you happen to
request _exactly_ the size of an existing fragment. Otherwise, you'd need
to split a fragment, creating two/three new ones, which you can't as there
is no slot - Fragmentation with the pre-8 rmalloc heap is pathological.
It's not with vmem, vmem allows the heap to work even if heavily
fragmented. But if you have a heavy "oversize consumer", the long-term
effect of that will be that all vmem arenas larger than the "most
frequently used 'big' size" become empty. ZFS will make all free spans
accumulate in the 128kB one under high load.
Ok, all that babbling in short: In Solaris 2.6, heap fragmentation was a
pathological scaling problem that lead to a system hang sooner or later
because of kernelmap exhaustion. The Vmem/quantum cache heap does function
even if the heap gets very fragmented - it scales. It doesn't remove the
possibility of the heap to fragment, but it deals with that gracefully.
What still is there, though, is the ability of a kernel memory consumer to
cause heap fragmentation - vmem can't solve the issue that if you allocate
and free a huge number of N-sized slabs in random ways over time, the heap
will in the end contain mostly N-sized fragments. That's what happens with
ZFS.
FrankH.
==========================================================================
No good can come from selling your freedom, not for all gold of the world,
for the value of this heavenly gift exceeds that of any fortune on earth.
==========================================================================
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss