On Fri, May 2, 2014 at 4:39 AM, Donovan Watteau <tso...@gmail.com> wrote:
> On Tue, 29 Apr 2014, Philip Guenther wrote: > > On Tue, Apr 29, 2014 at 8:17 AM, Donovan Watteau <tso...@gmail.com> > wrote: > > > I have various mountpoints from a NetApp NFS server with I use on > > > OpenBSD/amd64 5.5. > > > > > > $ grep nfs /etc/fstab > > > server:/vol/foobar /vol/foobar nfs > noauto,rw,nodev,nosuid,noatime,noexec,nfsv3,tcp,soft,intr,noac,-x=300,-t=1000,acregmin=3,acregmax=5,-r=65536,-w=65536 > 0 0 > > > (and some other mountpoints with the same options) > > > > That's a lot of knob turning. What documentation or testing led to > > you adding the tcp, noac, ac*, -x, -t, -r, and -w options? > > Indeed, I don't like turning knobs either, but this problem still > appears with a much simpler fstab (see below). > > My documentation is mount_nfs(8), and "Managing NFS and NIS" > (recommended by books.html). > > Basically: > * tcp: better suited our use case, with a noticeable speed improvement > and a better reliability regarding the files we need to go through > NFS. > * noac: a leftover, but removing it doesn't fix the problem. * ac: required for our use case. > How is that possible when you also set "noac" to *COMPLETELY DISABLE* attribute caching? Is it not obvious that at least *one* of those settings is completely bogus? This casts doubt on your claim that ac* are "required" for your use case. > * -x/-t: we needed a faster timeout/retry rate, but it may be too high. > ...so you just jammed in some bigger values? You *raised* the timeouts and retry counts, meaning a failed server will be *slower* to timeout and/or retry, not faster! Heck, your -t value is above the value that the kernel will clamp it to! What Problem Are You Trying To Solve? (tm) Using a soft,intr mount (so it's completely unreliable already) and then jacking the retry and timeout counts suggests that maybe NFS is the Wrong Solution for your problem, and an rsync'ed mirror makes more sense. > > > However, when I do a simple "ls /vol/foobar" after an hour without > > > anything else using this mountpoint, this appears in the logs: > > > > > > Apr 29 13:53:46 puffy /bsd: receive error 54 from nfs server > server:/vol/foobar > > > Apr 29 13:53:48 puffy last message repeated 833 times > > > > > > $ grep 54 /usr/include/sys/errno.h > > > #define ECONNRESET 54 /* Connection reset by peer */ > > > > Is there an idle timeout on the server or flaky network (NAT?) between > > this client and the server? > ... > > TCP connection is being dropped for some reason and then it takes a > > moment to be reopened when you try to use it again. > > Yes, I was wondering whether there is something left to be configured > on the OpenBSD side to prevent that (since the problem doesn't show up > on Debian running on the same machine), or should I look for a problem > on the NFS server or Cisco side? > A TCP connection, which should be able to stay open, idle, indefinitely, is being reset (as in, a packet with the RST flag is being received!). Linux apparently decides to silently spend CPU and packets and hide that something is breaking connections. If you're fine with that behavior, then just ignore the error messages from the OpenBSD kernel: it retries the connection for you already and is 'just' letting you know that your NFS server or network hates you (and maybe you should fix that). If you don't like that it does so, well, you have the source. Philip Guenther