----- Original Message -----
> From: Andreas Kurz <andr...@hastexo.com>
> Date: Sun, 21 Oct 2012 01:38:46 +0200
> Subject: Re: [Pacemaker] "Simple" LVM/drbd backed Primary/Secondary NFS cluster doesn't always failover cleanly
> To: pacemaker@oss.clusterlabs.org
>
>
On 10/18/2012 08:02 PM, Justin Pasher wrote:
I have a pretty basic setup by most people's standards, but there must
be something that is not quite right about it. Sometimes when I force a
resource failover from one server to the other, the clients with the NFS
mounts don't cleanly migrate to the new server. I configured this using
a few different "Pacemaker-DRBD-NFS" guides out there for reference (I
believe they were the Linbit guides).
Are you using the latest "exportfs" resource-agent from github-repo? ...
there have been bugfixes/improvements... and try to move the VIP for
each export to the end of its group so the IP where the clients connect
is started at the last/stopped at the first position.

Regards,
Andreas

I'm current running the version that comes with the Debian squeeze-backports resource-agents package (1:3.9.2-5~bpo60+1). I went ahead and grabbed a copy of exportfs from the git repository. It's a little risky for me to update the file right now, since the two resources I am worried about the most are the NFS shares for the XenServer VDIs, so when it has a hiccup in the connection to the NFS server, things start exploding (e.g. guest VMs start having disk errors and go read-only).

I scanned through the changes real quick and the biggest change I noticed was how the .rmtab file backup is restored (it sorts and filters unique entries instead of just concatenating the results to the end of /var/lib/nfs/rmtab). I had actually tweaked that a little bit myself before when I was trying to trace down the problem.

Ultimately I think my problem is more related to the NFS server itself and how it handles "unknown" client connections after a failover. I've see people here and there mention that /var/lib/nfs should be on the replicated device to maintain consistency after fail over, but the exportfs resource agent doesn't do anything like that. Is that not actually needed anymore? At any rate, in my situation, the problem is that I am maintaining four independent NFS shares and each one can be failed over separately (and running on either server at any time), so a simple copy of the directory won't work since there is no "master" server at any given time.

Also, I did find a bug in the way backup_rmtab() filters the export list for its backup. Since it looks for a leading AND trailing colon (:), it doesn't properly copy information about mounts that pulled from subdirectories under the NFS mount (e.g. instead of mounting /home, a server might mount /home/username such as with autofs, which won't get copied to the .rmtab backup). I'll file a bug report about that.

Thanks.

--
Justin Pasher

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to