Hi Mark,

Thank you. That makes a lot of sense. In our case we're talking around
10 multi-gigabyte files. The arc_max+3*arc_max+fragmentation was a bit
worrisome. It sounds then that this is mostly an issue on something
like an NFS server which had a ton of small files, where the
minimum_file_node_overhead*files was consuming the arc_max*3?

On a side-note it appears that most of our zio cache stay pretty
static. 99% of the ::memstat kernel memory increases are in
zio_buf_65536. It seems to increase between 5-50MB/hr depending on the
database update load.

Is integrating the ARC into the Solaris VM system a Solaris Nevada
goal? Or would that be the next major release after Nevada?

Best Regards,
Jason

On 1/10/07, Mark Maybee <[EMAIL PROTECTED]> wrote:
Jason J. W. Williams wrote:
> Hi Robert,
>
> Thank you! Holy mackerel! That's a lot of memory. With that type of a
> calculation my 4GB arc_max setting is still in the danger zone on a
> Thumper. I wonder if any of the ZFS developers could shed some light
> on the calculation?
>
In a worst-case scenario, Robert's calculations are accurate to a
certain degree:  If you have 1GB of dnode_phys data in your arc cache
(that would be about 1,200,000 files referenced), then this will result
in another 3GB of "related" data held in memory: vnodes/znodes/
dnodes/etc.  This related data is the in-core data associated with
an accessed file.  Its not quite true that this data is not evictable,
it *is* evictable, but the space is returned from these kmem caches
only after the arc has cleared its blocks and triggered the "free" of
the related data structures (and even then, the kernel will need to
to a kmem_reap to reclaim the memory from the caches).  The
fragmentation that Robert mentions is an issue because, if we don't
free everything, the kmem_reap may not be able to reclaim all the
memory from these caches, as they are allocated in "slabs".

We are in the process of trying to improve this situation.

> That kind of memory loss makes ZFS almost unusable for a database system.
>
Note that you are not going to experience these sorts of overheads
unless you are accessing *many* files.  In a database system, there are
only going to be a few files => no significant overhead.

> I agree that a page cache similar to UFS would be much better.  Linux
> works similarly to free pages, and it has been effective enough in the
> past. Though I'm equally unhappy about Linux's tendency to grab every
> bit of free RAM available for filesystem caching, and then cause
> massive memory thrashing as it frees it for applications.
>
The page cache is "much better" in the respect that it is more tightly
integrated with the VM system, so you get more efficient response to
memory pressure.  It is *much worse* than the ARC at caching data for
a file system.  In the long-term we plan to integrate the ARC into the
Solaris VM system.

> Best Regards,
> Jason
>
> On 1/10/07, Robert Milkowski <[EMAIL PROTECTED]> wrote:
>> Hello Jason,
>>
>> Wednesday, January 10, 2007, 9:45:05 PM, you wrote:
>>
>> JJWW> Sanjeev & Robert,
>>
>> JJWW> Thanks guys. We put that in place last night and it seems to be
>> doing
>> JJWW> a lot better job of consuming less RAM. We set it to 4GB and
>> each of
>> JJWW> our 2 MySQL instances on the box to a max of 4GB. So hopefully
>> slush
>> JJWW> of 4GB on the Thumper is enough. I would be interested in what the
>> JJWW> other ZFS modules memory behaviors are. I'll take a perusal through
>> JJWW> the archives. In general it seems to me that a max cap for ZFS
>> whether
>> JJWW> set through a series of individual tunables or a single root
>> tunable
>> JJWW> would be very helpful.
>>
>> Yes it would. Better yet would be if memory consumed by ZFS for
>> caching (dnodes, vnodes, data, ...) would behave similar to page cache
>> like with UFS so applications will be able to get back almost all
>> memory used for ZFS caches if needed.
>>
>> I guess (and it's really a guess only based on some emails here) that
>> in worst case scenario ZFS caches would consume about:
>>
>>   arc_max + 3*arc_max + memory lost for fragmentation
>>
>> So I guess with arc_max set to 1GB you can lost even 5GB (or more) and
>> currently only that first 1GB can be get back automatically.
>>
>>
>> --
>> Best regards,
>>  Robert                            mailto:[EMAIL PROTECTED]
>>                                        http://milek.blogspot.com
>>
>>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to