Hi Mark, Thank you. That makes a lot of sense. In our case we're talking around 10 multi-gigabyte files. The arc_max+3*arc_max+fragmentation was a bit worrisome. It sounds then that this is mostly an issue on something like an NFS server which had a ton of small files, where the minimum_file_node_overhead*files was consuming the arc_max*3?
On a side-note it appears that most of our zio cache stay pretty static. 99% of the ::memstat kernel memory increases are in zio_buf_65536. It seems to increase between 5-50MB/hr depending on the database update load. Is integrating the ARC into the Solaris VM system a Solaris Nevada goal? Or would that be the next major release after Nevada? Best Regards, Jason On 1/10/07, Mark Maybee <[EMAIL PROTECTED]> wrote:
Jason J. W. Williams wrote: > Hi Robert, > > Thank you! Holy mackerel! That's a lot of memory. With that type of a > calculation my 4GB arc_max setting is still in the danger zone on a > Thumper. I wonder if any of the ZFS developers could shed some light > on the calculation? > In a worst-case scenario, Robert's calculations are accurate to a certain degree: If you have 1GB of dnode_phys data in your arc cache (that would be about 1,200,000 files referenced), then this will result in another 3GB of "related" data held in memory: vnodes/znodes/ dnodes/etc. This related data is the in-core data associated with an accessed file. Its not quite true that this data is not evictable, it *is* evictable, but the space is returned from these kmem caches only after the arc has cleared its blocks and triggered the "free" of the related data structures (and even then, the kernel will need to to a kmem_reap to reclaim the memory from the caches). The fragmentation that Robert mentions is an issue because, if we don't free everything, the kmem_reap may not be able to reclaim all the memory from these caches, as they are allocated in "slabs". We are in the process of trying to improve this situation. > That kind of memory loss makes ZFS almost unusable for a database system. > Note that you are not going to experience these sorts of overheads unless you are accessing *many* files. In a database system, there are only going to be a few files => no significant overhead. > I agree that a page cache similar to UFS would be much better. Linux > works similarly to free pages, and it has been effective enough in the > past. Though I'm equally unhappy about Linux's tendency to grab every > bit of free RAM available for filesystem caching, and then cause > massive memory thrashing as it frees it for applications. > The page cache is "much better" in the respect that it is more tightly integrated with the VM system, so you get more efficient response to memory pressure. It is *much worse* than the ARC at caching data for a file system. In the long-term we plan to integrate the ARC into the Solaris VM system. > Best Regards, > Jason > > On 1/10/07, Robert Milkowski <[EMAIL PROTECTED]> wrote: >> Hello Jason, >> >> Wednesday, January 10, 2007, 9:45:05 PM, you wrote: >> >> JJWW> Sanjeev & Robert, >> >> JJWW> Thanks guys. We put that in place last night and it seems to be >> doing >> JJWW> a lot better job of consuming less RAM. We set it to 4GB and >> each of >> JJWW> our 2 MySQL instances on the box to a max of 4GB. So hopefully >> slush >> JJWW> of 4GB on the Thumper is enough. I would be interested in what the >> JJWW> other ZFS modules memory behaviors are. I'll take a perusal through >> JJWW> the archives. In general it seems to me that a max cap for ZFS >> whether >> JJWW> set through a series of individual tunables or a single root >> tunable >> JJWW> would be very helpful. >> >> Yes it would. Better yet would be if memory consumed by ZFS for >> caching (dnodes, vnodes, data, ...) would behave similar to page cache >> like with UFS so applications will be able to get back almost all >> memory used for ZFS caches if needed. >> >> I guess (and it's really a guess only based on some emails here) that >> in worst case scenario ZFS caches would consume about: >> >> arc_max + 3*arc_max + memory lost for fragmentation >> >> So I guess with arc_max set to 1GB you can lost even 5GB (or more) and >> currently only that first 1GB can be get back automatically. >> >> >> -- >> Best regards, >> Robert mailto:[EMAIL PROTECTED] >> http://milek.blogspot.com >> >> > _______________________________________________ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss