Re: NFS high availability

Joachim Schipper Wed, 09 Aug 2006 10:07:07 -0700

On Wed, Aug 09, 2006 at 08:51:59AM -0700, Spruell, Darren-Perot wrote:
> For diskless clients that bootstrap from and mount filesystems from an NFS
> server, is it feasible to provide highly-available NFS service using 2
> servers in a CARP cluster? A friend reports having tested this out and
> having everything work properly on the master, but as soon as CARP failover
> occured and I/O requests were sent to the backup node, the client started
> throwing "stale nfs file handle" errors. My assumption is that these are the
> result of ESTALE being returned by the server and that the system doesn't
> understand how to handle this gracefully and reopen the files.
> 
> I believe I understand why this occurs, but can't get my head around a good
> way to provide fault tolerance in this architecture. How is HA of the NFS
> server typically handled, and is CARP an appropriate solution? What other
> options are typically used to ensure ongoing client operation if the NFS
> server fails?


IIRC, the OpenBSD NFS server uses the inode number as a filehandle. If
this is true, dd'ing the partition in case would work. Of course, this
is a dirty hack that can stop working any second, and might or might not
work in the first place. And having servers on different architectures
or somesuch would be interesting, I suppose.

NFSv4 should handle this kind of problem more gracefully; there is an
experimental patch around for 3.8. Of course, it probably has a whole
slew of interesting, undocumented features and might miss exactly this
functionality. Google suggest the aforementioned patch might live at
ftp://ftp.cis.uoguelph.ca/pub/nfsv4; I haven't looked at the code there,
but at least OpenBSD looks to be as well supported as any other
platform, and the patch should apply against 3.9.

                Joachim

Re: NFS high availability

Reply via email to