Secondary NFS cluster doesn't always failover cleanly

Justin Pasher Tue, 23 Oct 2012 08:35:51 -0700

----- Original Message -----
> From: Andreas Kurz <andr...@hastexo.com>
> Date: Sun, 21 Oct 2012 01:38:46 +0200

> Subject: Re: [Pacemaker] "Simple" LVM/drbd backed Primary/SecondaryNFS cluster doesn't always failover cleanly

> To: pacemaker@oss.clusterlabs.org
>
>

On 10/18/2012 08:02 PM, Justin Pasher wrote:

I have a pretty basic setup by most people's standards, but there must
be something that is not quite right about it. Sometimes when I force a
resource failover from one server to the other, the clients with the NFS
mounts don't cleanly migrate to the new server. I configured this using
a few different "Pacemaker-DRBD-NFS" guides out there for reference (I
believe they were the Linbit guides).

Are you using the latest "exportfs" resource-agent from github-repo? ...
there have been bugfixes/improvements... and try to move the VIP for
each export to the end of its group so the IP where the clients connect
is started at the last/stopped at the first position.


Regards,
Andreas

I'm current running the version that comes with the Debiansqueeze-backports resource-agents package (1:3.9.2-5~bpo60+1). I wentahead and grabbed a copy of exportfs from the git repository. It's alittle risky for me to update the file right now, since the tworesources I am worried about the most are the NFS shares for theXenServer VDIs, so when it has a hiccup in the connection to the NFSserver, things start exploding (e.g. guest VMs start having disk errorsand go read-only).

I scanned through the changes real quick and the biggest change Inoticed was how the .rmtab file backup is restored (it sorts and filtersunique entries instead of just concatenating the results to the end of/var/lib/nfs/rmtab). I had actually tweaked that a little bit myselfbefore when I was trying to trace down the problem.

Ultimately I think my problem is more related to the NFS server itselfand how it handles "unknown" client connections after a failover. I'vesee people here and there mention that /var/lib/nfs should be on thereplicated device to maintain consistency after fail over, but theexportfs resource agent doesn't do anything like that. Is that notactually needed anymore? At any rate, in my situation, the problem isthat I am maintaining four independent NFS shares and each one can befailed over separately (and running on either server at any time), so asimple copy of the directory won't work since there is no "master"server at any given time.

Also, I did find a bug in the way backup_rmtab() filters the export listfor its backup. Since it looks for a leading AND trailing colon (:), itdoesn't properly copy information about mounts that pulled fromsubdirectories under the NFS mount (e.g. instead of mounting /home, aserver might mount /home/username such as with autofs, which won't getcopied to the .rmtab backup). I'll file a bug report about that.


Thanks.

--
Justin Pasher

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] "Simple" LVM/drbd backed Primary/Secondary NFS cluster doesn't always failover cleanly

Reply via email to