Re: NFS server down, again, and again, and again...

Ryan Freeman Wed, 18 Apr 2018 11:05:40 -0700

On Wed, Apr 18, 2018 at 01:08:01PM -0400, Rupert Gallagher wrote:
> This is all I managed to retrieve from the logs (/var/log/daemons, 
> /var/log/messages):
> 
> Mar 12 09:27:20 server mountd[50607]: Socket disconnected
> Mar 29 18:05:30 server mountd[52162]: Socket disconnected
> Apr 16 12:04:07 server mountd[66430]: Socket disconnected
> Apr 17 17:55:26 server mountd[14081]: Socket disconnected
> 
> No messages from nfsd and portmap.
> 
> If the logs are true, then mountd is the daemon that is causing problems.
> 
> The manual says
> 
> >     -d      Enable debugging mode.  mountd will not detach from the
> >            controlling terminal and will print debugging messages to stderr.
> 
> The above option does not work, because it detaches from the terminal:
> 
> > > doas /sbin/mountd -d
> > Here we go.
>

This is how it works when your system is normal:
$ doas touch /etc/exports
$ doas mountd -d
Here we go.
Getting export list.
unexporting / /
unexporting /home /home
unexporting /tmp /tmp
unexporting /usr /usr
unexporting /usr/X11R6 /usr/X11R6
unexporting /usr/local /usr/local
unexporting /usr/obj /usr/obj
unexporting /usr/ports /usr/ports
unexporting /usr/src /usr/src
unexporting /var /var
unexporting /tmpfs /tmpfs
Getting mount list.
* waiting here in foreground *

> I tried "mountd_flags=-d" in rc.conf.local, and rebooted the whole OS, 
> because mountd refuses soft restart. As a result, the OS refuses to boot. 
> System crashed.

On this point, ``rcctl restart mountd'' works fine.  Restarting mountd
will not harm things already mounted, they will already be handled by one
of the running nfsd processes.

Also, ``pkill -1 mountd'' tends to work fine as well.  You can verify this
when you adjust /etc/exports by using ``showmount -e'', making a new or
removing an exports entry, SIGHUP the mountd process, and check showmount
again.

I have never needed to reboot just to reload/restart mountd.

You may want to revisit how you set these machines up, as it is likely
that cognitive bias from your 30+ years of experience is making you miss
something simple.

> 
> On 18 April 2018 2:47 AM, IL Ka <kazakevichi...@gmail.com> wrote:
> 
> > You could use ktrace(1) to trace all calls and then use kdump(1) to read 
> > them, and may help you to find what cause it to die, but it may be tricky 
> > for anyone except nfsd developer..
> > You can also try to find person who supports it by looking at last commits 
> > to:
> > https://github.com/openbsd/src/blame/master/sbin/nfsd/nfsd.c
> > and email this person, but I do not know if it will help, or talk to people 
> > on bugs@ list.
> >
> > Or you can move to samba/smbd: SMB must have good support in Windows.
> >
> > On Wed, Apr 18, 2018 at 2:53 AM, Rupert Gallagher <r...@protonmail.com> 
> > wrote:
> >
> >>> Do you mean nfsd server dies?
> >>
> >> I mean the NFS service as delivered by nfsd, portmap and mountd.
> >>
> >>> Does it provide core dump?
> >>
> >> No!
> >>
> >>> You do not need to restart it
> >> manually: just create script that checks for server existence (like 
> >> ``/etc/rc.d/nfsd check``) and run it if it is dead.
> >> I usually prepare my servers from source with custom patches and settings. 
> >> When a server dies on me, it makes a lot of noise in the logs, and it 
> >> happens rarely. In 30+ years of activity, I have never restarted a 
> >> production server because of clients using it!
> >>
> >> NFS is an exception. I am using the obsd default, and it dies on me under 
> >> load and without logs. It is unreliable.

Re: NFS server down, again, and again, and again...

Reply via email to