Jason J. W. Williams wrote:
Hi Robert,
Thank you! Holy mackerel! That's a lot of memory. With that type of a
calculation my 4GB arc_max setting is still in the danger zone on a
Thumper. I wonder if any of the ZFS developers could shed some light
on the calculation?
In a worst-case scenario, Robert's calculations are accurate to a
certain degree: If you have 1GB of dnode_phys data in your arc cache
(that would be about 1,200,000 files referenced), then this will result
in another 3GB of "related" data held in memory: vnodes/znodes/
dnodes/etc. This related data is the in-core data associated with
an accessed file. Its not quite true that this data is not evictable,
it *is* evictable, but the space is returned from these kmem caches
only after the arc has cleared its blocks and triggered the "free" of
the related data structures (and even then, the kernel will need to
to a kmem_reap to reclaim the memory from the caches). The
fragmentation that Robert mentions is an issue because, if we don't
free everything, the kmem_reap may not be able to reclaim all the
memory from these caches, as they are allocated in "slabs".
We are in the process of trying to improve this situation.
That kind of memory loss makes ZFS almost unusable for a database system.
Note that you are not going to experience these sorts of overheads
unless you are accessing *many* files. In a database system, there are
only going to be a few files => no significant overhead.
I agree that a page cache similar to UFS would be much better. Linux
works similarly to free pages, and it has been effective enough in the
past. Though I'm equally unhappy about Linux's tendency to grab every
bit of free RAM available for filesystem caching, and then cause
massive memory thrashing as it frees it for applications.
The page cache is "much better" in the respect that it is more tightly
integrated with the VM system, so you get more efficient response to
memory pressure. It is *much worse* than the ARC at caching data for
a file system. In the long-term we plan to integrate the ARC into the
Solaris VM system.
Best Regards,
Jason
On 1/10/07, Robert Milkowski <[EMAIL PROTECTED]> wrote:
Hello Jason,
Wednesday, January 10, 2007, 9:45:05 PM, you wrote:
JJWW> Sanjeev & Robert,
JJWW> Thanks guys. We put that in place last night and it seems to be
doing
JJWW> a lot better job of consuming less RAM. We set it to 4GB and
each of
JJWW> our 2 MySQL instances on the box to a max of 4GB. So hopefully
slush
JJWW> of 4GB on the Thumper is enough. I would be interested in what the
JJWW> other ZFS modules memory behaviors are. I'll take a perusal through
JJWW> the archives. In general it seems to me that a max cap for ZFS
whether
JJWW> set through a series of individual tunables or a single root
tunable
JJWW> would be very helpful.
Yes it would. Better yet would be if memory consumed by ZFS for
caching (dnodes, vnodes, data, ...) would behave similar to page cache
like with UFS so applications will be able to get back almost all
memory used for ZFS caches if needed.
I guess (and it's really a guess only based on some emails here) that
in worst case scenario ZFS caches would consume about:
arc_max + 3*arc_max + memory lost for fragmentation
So I guess with arc_max set to 1GB you can lost even 5GB (or more) and
currently only that first 1GB can be get back automatically.
--
Best regards,
Robert mailto:[EMAIL PROTECTED]
http://milek.blogspot.com
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss