> On 15. Apr 2018, at 01:18, Rick Macklem <rmack...@uoguelph.ca> wrote: > > Niels Kobschätzki wrote: >>> On 04/14/2018 03:49 AM, Rick Macklem wrote: >>> Niels Kobschätzki wrote: >>>> sorry for the cross-posting but so far I had no real luck on the forum >>>> or on question, thus I want to try my luck here as well. >>> I read email lists but don't do the other stuff, so I just saw this >>> yesterday. >>> Short answer, I haven't a clue why cache hits rate would have changed. >>> >>> The code that decides if there is a hit/miss for the attribute cache is in >>> ncl_getattrcache() and the code hasn't changed between 10.3->11.1, >>> except the old code did a mtx_lock(&Giant), but I can't imagine how that >>> would affect the code. >>> >>> You might want to: >>> # sysctl -a | fgrep vfs.nfs >>> for both the 10.3 and 11.1 systems, to check if any defaults have somehow >>> been changed. (I don't recall any being changed, but??) >> >> I did that and there did nothing change. >> >>> If you go into ncl_getattrcache() {it's in sys/fs/nfsclient/nfs_clsubs.c} >>> and add a printf() for "time_second" and "np->n_mtime.tv_sec" near the >>> top, where it calculates "timeo" from it. >>> Running this hacked kernel might show you if either of these fields is >>> bogus. >>> (You could then printf() "timeo" and "np->n_attrtimeo" just before the "if" >>> clause that increments "attrcache_misses", which is where the cache misses >>> happen to see why it is missing the cache.) >>> If you could do this for the 10.3 kernel as well, this might indicate why >>> the >>> miss rate has increased? >> >> I will do this next week. On monday we switch for other reasons to other >> nfs-servers and when we see that they run stable, I will do this next. > With a miss rate of 2.7%, I doubt printing the above will help. I thought > you were seeing a high miss rate.
It is low but increased by nearly a factor of 1000 to before. I hope the print will help. Just a lot of grepping through wherever I can get this data. >> Btw. I calculated now the percentages. The old servers had a attr miss >> rate of something like 0.004%, while the upgraded one has more like >> 2.7%. This is till low from what I've read (I remember that you should >> start adjusting acreg* when you hit more than 40% misses) but far higher >> than before. > You could try increasing acregmin, acregmax and see if the misses are reduced. > (The only risk with increasing the cache timeout is that, if another client > changes > the attributes, then the client will use stale ones for longer. Usually, this > doesn't > cause serious problems.) I tried that and it had exactly no effect > To be honest, a Getattr RPC is pretty low overhead, so I doubt the increase > to 2.7% will affect your application's performance, but it is interesting that > it increased. It is a website with quite some traffic handles by three webservers behind a pair of loadbalancers. We see a loss of 20% in speed(TTFB reduced by 100ms; sounds not a lot but Google et al doesn’t like it at all) after upgrading to 11.1 with a combined upgrade to php7.1. On another server without NFS that upgrade improved performance considerably (I was told ca 30% by the front end-dev) > You might also try increasing acdirmin, acdirmax in case it is the directory > attributes that are having cache misses. I did that, too > Oh, and check that your time of day clocks are in sync with the server, > since the caches are time based, since there is no cache coherency protocol > in NFS. I checked that. All three frontends are using the same server for ntp Thanks so far, Niels _______________________________________________ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"