On Wed, Aug 09, 2006 at 08:51:59AM -0700, Spruell, Darren-Perot wrote:
> For diskless clients that bootstrap from and mount filesystems from an NFS
> server, is it feasible to provide highly-available NFS service using 2
> servers in a CARP cluster? A friend reports having tested this out and
> having everything work properly on the master, but as soon as CARP failover
> occured and I/O requests were sent to the backup node, the client started
> throwing "stale nfs file handle" errors. My assumption is that these are the
> result of ESTALE being returned by the server and that the system doesn't
> understand how to handle this gracefully and reopen the files.
> 
> I believe I understand why this occurs, but can't get my head around a good
> way to provide fault tolerance in this architecture. How is HA of the NFS
> server typically handled, and is CARP an appropriate solution? What other
> options are typically used to ensure ongoing client operation if the NFS
> server fails?

IIRC, the OpenBSD NFS server uses the inode number as a filehandle. If
this is true, dd'ing the partition in case would work. Of course, this
is a dirty hack that can stop working any second, and might or might not
work in the first place. And having servers on different architectures
or somesuch would be interesting, I suppose.

NFSv4 should handle this kind of problem more gracefully; there is an
experimental patch around for 3.8. Of course, it probably has a whole
slew of interesting, undocumented features and might miss exactly this
functionality. Google suggest the aforementioned patch might live at
ftp://ftp.cis.uoguelph.ca/pub/nfsv4; I haven't looked at the code there,
but at least OpenBSD looks to be as well supported as any other
platform, and the patch should apply against 3.9.

                Joachim

Reply via email to