On Friday 29 August 2014 11:54:42 Alan Cox wrote:
snip...

> > Others have also confirmed that even with r265945 they can still > > trigger
> > performance issue.
> >
> > In addition without it we still have loads of RAM sat their > > unused, in my > > particular experience we have 40GB of 192GB sitting their unused > > and that
> > was with a stable build from last weekend.
>
> The Solaris code only imposed this limit on 32-bit machines where > the
> available kernel virtual address space may be much less than the
> available physical memory. Previously, FreeBSD imposed this limit > on > both 32-bit and 64-bit machines. Now, it imposes it on neither. > Why
> continue to do this differently from Solaris?

My understanding is these limits where totally different on Solaris see the #ifdef sun block in arc_reclaim_needed() for details. I actually started at matching the Solaris flow but this had already been tested and proved not
to work as well as the current design.

Since the question was asked below, we don't have zfs machines in the cluster running i386. We can barely get them to boot as it is due to kva pressure. We have to reduce/cap physical memory and change the user/kernel virtual split
from 3:1 to 2.5:1.5.

We do run zfs on small amd64 machines with 2G of ram, but I can't imagine it
working on the 10G i386 PAE machines that we have.


> > With the patch we confirmed that both RAM usage and performance > > for those
> > seeing that issue are resolved, with no reported regressions.
> >
> >> (I should know better than to fire a reply off before full fact
> >> checking, but
> >> this commit worries me..)
> >
> > Not a problem, its great to know people pay attention to changes, > > and
> > raise
> > their concerns. Always better to have a discussion about potential > > issues
> > than to wait for a problem to occur.
> >
> > Hopefully the above gives you some piece of mind, but if you still
> > have any
> > concerns I'm all ears.
>
> You didn't really address Peter's initial technical issue.  Peter
> correctly observed that cache pages are just another flavor of free
> pages.  Whenever the VM system is checking the number of free pages
> against any of the thresholds, it always uses the sum of > v_cache_count
> and v_free_count.  So, to anyone familiar with the VM system, like
> Peter, what you've done, which is to derive a threshold from
> v_free_target but only compare v_free_count to that threshold, looks
> highly suspect.

I think I'd like to see something like this:

Index: cddl/compat/opensolaris/kern/opensolaris_kmem.c
===================================================================
--- cddl/compat/opensolaris/kern/opensolaris_kmem.c (revision 270824)
+++ cddl/compat/opensolaris/kern/opensolaris_kmem.c (working copy)
@@ -152,7 +152,8 @@
 kmem_free_count(void)
 {

- return (vm_cnt.v_free_count);
+ /* "cache" is just a flavor of free pages in FreeBSD */
+ return (vm_cnt.v_free_count + vm_cnt.v_cache_count);
 }

 u_int

This has apparently already been tried and the response from Karl was:

- No, because memory in "cache" is subject to being either reallocated or freed. - When I was developing this patch that was my first impression as well and how
- I originally coded it, and it turned out to be wrong.
-
- The issue here is that you have two parts of the system contending for RAM -- - the VM system generally, and the ARC cache. If the ARC cache frees space before - the VM system activates and starts pruning then you wind up with the ARC pinned
- at the minimum after some period of time, because it releases "early."

I've asked him if he would retest just to be sure.

The rest of the system looks at the "big picture" it would be happy to let the "free" pool run quite a way down so long as there's "cache" pages available to satisfy the free space requirements. This would lead ZFS to mistakenly sacrifice ARC for no reason. I'm not sure how big a deal this is, but I can't imagine many scenarios where I want ARC to be discarded in order to save some
effectively free pages.

From Karl's response from the original PR (above) it seems like this
causes
unexpected behaviour due to the two systems being seperate.

> That said, I can easily believe that your patch works better than > the > existing code, because it is closer in spirit to my interpretation > of > what the Solaris code does. Specifically, I believe that the > Solaris
> code starts trimming the ARC before the Solaris page daemon starts
> writing dirty pages to secondary storage. Now, you've made FreeBSD > do
> the same.  However, you've expressed it in a way that looks broken.
>
> To wrap up, I think that you can easily write this in a way that
> simultaneously behaves like Solaris and doesn't look wrong to a VM > expert.
>
> > Out of interest would it be possible to update machines in the > > cluster to
> > see how their workload reacts to the change?
> >

I'd like to see the free vs cache thing resolved first but it's going to be
tricky to get a comparison.

Does Karl's explaination as to why this doesn't work above change your mind?

For the first few months of the year, things were really troublesome. It was
quite easy to overtax the machines and run them into the ground.

This is not the case now - things are working pretty well under pressure
(prior to the commit).  Its got to the point that we feel comfortable
thrashing the machines really hard again. Getting a comparison when it
already works well is going to be tricky.

We don't have large memory machines that aren't already tuned for
vfs.zfs.arc_max caps for tmpfs use.

For context to the wider audience, we do not run -release or -pN in the freebsd cluster. We mostly run -current, and some -stable. I am well aware that there is significant discomfort in 10.0-R with zfs but we already have the fixes for that.

_______________________________________________
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

Reply via email to