On Wed, Aug 09, 2006 at 08:51:59AM -0700, Spruell, Darren-Perot wrote: > For diskless clients that bootstrap from and mount filesystems from an NFS > server, is it feasible to provide highly-available NFS service using 2 > servers in a CARP cluster? A friend reports having tested this out and > having everything work properly on the master, but as soon as CARP failover > occured and I/O requests were sent to the backup node, the client started > throwing "stale nfs file handle" errors. My assumption is that these are the > result of ESTALE being returned by the server and that the system doesn't > understand how to handle this gracefully and reopen the files. > > I believe I understand why this occurs, but can't get my head around a good > way to provide fault tolerance in this architecture. How is HA of the NFS > server typically handled, and is CARP an appropriate solution? What other > options are typically used to ensure ongoing client operation if the NFS > server fails?
IIRC, the OpenBSD NFS server uses the inode number as a filehandle. If this is true, dd'ing the partition in case would work. Of course, this is a dirty hack that can stop working any second, and might or might not work in the first place. And having servers on different architectures or somesuch would be interesting, I suppose. NFSv4 should handle this kind of problem more gracefully; there is an experimental patch around for 3.8. Of course, it probably has a whole slew of interesting, undocumented features and might miss exactly this functionality. Google suggest the aforementioned patch might live at ftp://ftp.cis.uoguelph.ca/pub/nfsv4; I haven't looked at the code there, but at least OpenBSD looks to be as well supported as any other platform, and the patch should apply against 3.9. Joachim