On Friday 29 August 2014 11:54:42 Alan Cox wrote:
snip...
> > Others have also confirmed that even with r265945 they can still
> > trigger
> > performance issue.
> >
> > In addition without it we still have loads of RAM sat their
> > unused, in my
> > particular experience we have 40GB of 192GB sitting their unused
> > and that
> > was with a stable build from last weekend.
>
> The Solaris code only imposed this limit on 32-bit machines where
> the
> available kernel virtual address space may be much less than the
> available physical memory. Previously, FreeBSD imposed this limit
> on
> both 32-bit and 64-bit machines. Now, it imposes it on neither.
> Why
> continue to do this differently from Solaris?
My understanding is these limits where totally different on Solaris see
the
#ifdef sun block in arc_reclaim_needed() for details. I actually started
at
matching the Solaris flow but this had already been tested and proved
not
to work as well as the current design.
Since the question was asked below, we don't have zfs machines in the
cluster
running i386. We can barely get them to boot as it is due to kva
pressure.
We have to reduce/cap physical memory and change the user/kernel
virtual split
from 3:1 to 2.5:1.5.
We do run zfs on small amd64 machines with 2G of ram, but I can't
imagine it
working on the 10G i386 PAE machines that we have.
> > With the patch we confirmed that both RAM usage and performance
> > for those
> > seeing that issue are resolved, with no reported regressions.
> >
> >> (I should know better than to fire a reply off before full fact
> >> checking, but
> >> this commit worries me..)
> >
> > Not a problem, its great to know people pay attention to changes,
> > and
> > raise
> > their concerns. Always better to have a discussion about potential
> > issues
> > than to wait for a problem to occur.
> >
> > Hopefully the above gives you some piece of mind, but if you still
> > have any
> > concerns I'm all ears.
>
> You didn't really address Peter's initial technical issue. Peter
> correctly observed that cache pages are just another flavor of free
> pages. Whenever the VM system is checking the number of free pages
> against any of the thresholds, it always uses the sum of
> v_cache_count
> and v_free_count. So, to anyone familiar with the VM system, like
> Peter, what you've done, which is to derive a threshold from
> v_free_target but only compare v_free_count to that threshold, looks
> highly suspect.
I think I'd like to see something like this:
Index: cddl/compat/opensolaris/kern/opensolaris_kmem.c
===================================================================
--- cddl/compat/opensolaris/kern/opensolaris_kmem.c (revision 270824)
+++ cddl/compat/opensolaris/kern/opensolaris_kmem.c (working copy)
@@ -152,7 +152,8 @@
kmem_free_count(void)
{
- return (vm_cnt.v_free_count);
+ /* "cache" is just a flavor of free pages in FreeBSD */
+ return (vm_cnt.v_free_count + vm_cnt.v_cache_count);
}
u_int
This has apparently already been tried and the response from Karl was:
- No, because memory in "cache" is subject to being either reallocated
or freed.
- When I was developing this patch that was my first impression as well
and how
- I originally coded it, and it turned out to be wrong.
-
- The issue here is that you have two parts of the system contending for
RAM --
- the VM system generally, and the ARC cache. If the ARC cache frees
space before
- the VM system activates and starts pruning then you wind up with the
ARC pinned
- at the minimum after some period of time, because it releases "early."
I've asked him if he would retest just to be sure.
The rest of the system looks at the "big picture" it would be happy to
let the
"free" pool run quite a way down so long as there's "cache" pages
available to
satisfy the free space requirements. This would lead ZFS to
mistakenly
sacrifice ARC for no reason. I'm not sure how big a deal this is, but
I can't
imagine many scenarios where I want ARC to be discarded in order to
save some
effectively free pages.
From Karl's response from the original PR (above) it seems like this
causes
unexpected behaviour due to the two systems being seperate.
> That said, I can easily believe that your patch works better than
> the
> existing code, because it is closer in spirit to my interpretation
> of
> what the Solaris code does. Specifically, I believe that the
> Solaris
> code starts trimming the ARC before the Solaris page daemon starts
> writing dirty pages to secondary storage. Now, you've made FreeBSD
> do
> the same. However, you've expressed it in a way that looks broken.
>
> To wrap up, I think that you can easily write this in a way that
> simultaneously behaves like Solaris and doesn't look wrong to a VM
> expert.
>
> > Out of interest would it be possible to update machines in the
> > cluster to
> > see how their workload reacts to the change?
> >
I'd like to see the free vs cache thing resolved first but it's going
to be
tricky to get a comparison.
Does Karl's explaination as to why this doesn't work above change your
mind?
For the first few months of the year, things were really troublesome.
It was
quite easy to overtax the machines and run them into the ground.
This is not the case now - things are working pretty well under
pressure
(prior to the commit). Its got to the point that we feel comfortable
thrashing the machines really hard again. Getting a comparison when
it
already works well is going to be tricky.
We don't have large memory machines that aren't already tuned for
vfs.zfs.arc_max caps for tmpfs use.
For context to the wider audience, we do not run -release or -pN in
the
freebsd cluster. We mostly run -current, and some -stable. I am
well aware
that there is significant discomfort in 10.0-R with zfs but we already
have the
fixes for that.
_______________________________________________
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"