I'm going crazy here. There are two 2.0.38 slink machines serving NFS at our site, and every few weeks they stop allowing new mounts. I've searched exhaustively on the net without finding anything, and I was hoping maybe someone here had encountered a similar problem.
Incidently, we had the same problem with a 2.0.36 kernel, and it always coincided with syslog messages flagging possible syn attacks. We thought that maybe it was a kernel bug, so upgraded and disabled the syn cookie option, which seemed to fix the problem at the time, but apparently hasn't. We can go for weeks without a single blip, and then spend days with NFS more down than up. Our users are understandably upset, and so are we. I'm having trouble keeping the FreeBSD bigots at bay. Problem description: - the NFS daemons (nfsd and mountd) don't die or freeze, but don't service requests; thus in debug mode mountd gives Feb 15 15:36:05 square mountd[432]: mnt [1 100/2/15 15:36:05 yoda.cnd.mcgill.ca 0.0+0,10] Feb 15 15:36:05 square mountd[432]: ^I/exports/u0 Feb 15 15:36:05 square mountd[432]: NFS mount of /exports/u0 attempted from 132.206.114.131 and sometimes even prints a line claiming success, but on the client the mount always returns "RPC: Timed out" while the problem is ongoing - usually it'll be just mountd that screws up, and already mounted filesystems are fine; it's unclear to me whether nfsd also gets hosed by itself, or if it happens as a result of trying to get mountd running again; I suspect the latter - sometimes restarting nfs-server helps; often we actually have to reboot to get a quick fix, but even that doesn't last long on some days, and sometimes it doesn't even seem to help at all; sometimes multiple kill/restart sequences seem to work, though why that should be I don't know I've tried leaving long straces running on mountd, I've tried sniffing the network to see if we're being hurt by something else on the network, and can't see anything noteworthy. The only oddity I've noticed recently is that an rpcinfo -p of the server lists two unnamed services I can't track down (any theories are welcome - I'm pretty ignorant of rpc stuff generally): 600100069 1 udp 773 600100069 1 tcp 775 Also, this may be incidental, but the behaviour occurs much less commonly on the server we have named gloom, which is on a different network and subject to different traffic. Fixing it on gloom is usually more pressing and more lasting, so I haven't had a chance to investigate as thoroughly there. --- p.s. We wanted to try running nfsd/mountd from inetd to see if that helped matters in the short term, but inetd seemed to only be willing to register the udp or the tcp servers, not both. Is this a known problem? Lines used were from the manpages, i.e. mount/1-2 dgram rpc/udp wait root /usr/sbin/rpc.mountd rpc.mountd mount/1-2 stream rpc/tcp wait root /usr/sbin/rpc.mountd rpc.mountd nfs/2 dgram rpc/udp wait root /usr/sbin/rpc.nfsd rpc.nfsd nfs/2 stream rpc/tcp wait root /usr/sbin/rpc.nfsd rpc.nfsd