I've been poking at this more over the weekend and this morning. And while your tip about rmtab was useful, it still didn't resolve the problem. I also made sure that my exports were only being handled/defined by pacemaker and not by /etc/exports. Though for the cloned nfsserver resource to work, it seems you need an /etc/exports file to exist on the server, even if it's empty.
It seems the clue as to what's going on is in this line from the log: coronado exportfs[20325]: INFO: Sleeping 92 seconds to accommodate for NFSv4 lease expiry If I bump up the timeout for the exportfs resource to 95 sec, then after the very long timeout, it switches over correctly. So while this is a working solution to the problem, a 95 sec timeout is a little long for my personal comfort on a live and active fileserver. Any idea what is instigating this timeout? Is is exportfs (looks that way from the log entry), nfsd, or pacemaker? If pacemaker, then where can I reduce or remove this? I've been looking at disabling nfsv4 entirely on this server, as I don't really need it, but haven't found a solution that works yet. Tried the suggestion in this thread, but it seems to be for mounts, not nfsd, and still doesn't help: http://lists.debian.org/debian-user/2011/11/msg01585.html Though I have found that v4 is being loaded on one host but not the other. So if I can find what's different, I may be able to make that work. coronado:~# rpcinfo -u localhost nfs program 100003 version 2 ready and waiting program 100003 version 3 ready and waiting program 100003 version 4 ready and waiting cascadia:~# rpcinfo -u localhost nfs program 100003 version 2 ready and waiting program 100003 version 3 ready and waiting Any further suggestions are welcome. I'll keep poking until I find a solution. Thanks. Seth On 04/16/2012 11:49 AM, William Seligman wrote: > On 4/14/12 5:55 AM, emmanuel segura wrote: >> Maybe the problem it's the primitive nfsserver lsb:nfs-kernel-server, i >> think this primitive was stoped befoure exportfs-admin >> ocf:heartbeat:exportfs >> >> And if i rember the lsb:nfs-kernel-server and exportfs agent does the same >> thing >> >> the first use the os scripts and the second the cluster agents > > Now that Emmanuel has reminded me, I'll offer two more tips based on advice > he's > given me in the past: > > - You can deal with issue he raises directly by putting additional constraints > in your setup, something like: > > colocation fs-homes-nfsserver inf: group-homes clone-nfsserver > order nfssserver-before-homes inf: clone-nfsserver group-homes > > That will make sure that all the group-homes resources (including > exportfs-admin) will not be run unless an instance of nfsserver is already > running on that node. > > - There's a more fundamental question: Why are you placing the start/stop of > your NFS server on both nodes under pacemaker control? Why not have the NFS > server start at system startup on each node? > > The only reason I see for putting NFS under Pacemaker control is if there are > entries in your /etc/exports file (or the Debian equivalent) that won't work > unless other Pacemaker-controlled resources are running, such as DRBD. If > that's > the case, you're better off controlling them with Pacemaker exportfs > resources, > the same as you're doing with exportfs-admin, instead of /etc/exports entries. > >> Il giorno 14 aprile 2012 01:50, William Seligman< >> [email protected]> ha scritto: >> >>> On 4/13/12 7:18 PM, William Seligman wrote: >>>> On 4/13/12 6:42 PM, Seth Galitzer wrote: >>>>> In attempting to build a nice clean config, I'm now in a state where >>>>> exportfs never starts. It always times out and errors. >>>>> >>>>> crm config show is pasted here: http://pastebin.com/cKFFL0Xf >>>>> syslog after an attempted restart here: http://pastebin.com/CHdF21M4 >>>>> >>>>> Only IPs have been edited. >>>> >>>> It's clear that your exportfs resource is timing out for the admin >>> resource. >>>> >>>> I'm no expert, but here are some "stupid exportfs tricks" to try: >>>> >>>> - Check your /etc/exports file (or whatever the equivalent is in Debian; >>>> "man exportfs" will tell you) on both nodes. Make sure you're not already >>>> exporting the directory when the NFS server starts. >>>> >>>> - Take out the exportfs-admin resource. Then try doing things manually: >>>> >>>> # exportfs x.x.x.0/24:/exports/admin >>>> >>>> Assuming that works, then look at the output of just >>>> >>>> # exportfs >>>> >>>> The clientspec reported by exportfs has to match the clientspec you put >>>> into the resource exactly. If exportfs is canonicalizing or reporting the >>>> clientspec differently, the exportfs monitor won't work. If this is the >>>> case, change the clientspec parameter in exportfs-admin to match. >>>> >>>> If the output of exportfs has any results that span more than one line, >>>> then you've got the problem that the patch I referred you to (quoted >>>> below) is supposed to fix. You'll have to apply the patch to your >>>> exportfs resource. >>> >>> Wait a second; I completely forgot about this thread that I started: >>> >>> <http://www.gossamer-threads.com/lists/linuxha/users/78585> >>> >>> The solution turned out to be to remove the .rmtab files from the >>> directories I was exporting, deleting& touching /var/lib/nfs/rmtab (you'll >>> have to look up the Debian location), and adding rmtab_backup="none" to all >>> my exportfs resources. >>> >>> Hopefully there's a solution for you in there somewhere! >>> >>>>> On 04/13/2012 01:51 PM, William Seligman wrote: >>>>>> On 4/13/12 12:38 PM, Seth Galitzer wrote: >>>>>>> I'm working through this howto doc: >>>>>>> http://www.linbit.com/fileadmin/tech-guides/ha-nfs.pdf >>>>>>> and am stuck at section 4.4. When I put the primary node in standby, it >>>>>>> seems that NFS never releases the export, so it can't shut down, and >>>>>>> thus can't get started on the secondary node. Everything up to that >>>>>>> point in the doc works fine and fails over correctly. But once I add >>>>>>> the exportfs resource, it fails. I'm running this on debian wheezy with >>>>>>> the included standard packages, not custom. >>>>>>> >>>>>>> Any suggestions? I'd be happy to post configs and logs if requested. >>>>>> >>>>>> Yes, please post the output of "crm configure show", the output of >>>>>> "exportfs" while the resource is running properly, and the relevant >>>>>> sections of your log file. I suggest using pastebin.com, to keep >>>>>> mailboxes filling up with walls of text. >>>>>> >>>>>> In case you haven't seen this thread already, you might want to take a >>>>>> look: >>>>>> >>>>>> <http://www.gossamer-threads.com/lists/linuxha/dev/77166> >>>>>> >>>>>> And the resulting commit: >>>>>> <https://github.com/ClusterLabs/resource-agents/commit/5b0bf96e77ed3c4e179c8b4c6a5ffd4709f8fdae> >>>>>> >>>>>> (Links courtesy of Lars Ellenberg.) >>>>>> >>>>>> The problem and patch discussed in those links doesn't quite match >>>>>> what you describe. I mention it because I had to patch my exportfs >>>>>> resource (in /usr/lib/ocf/resource.d/heartbeat/exportfs on my RHEL >>>>>> systems) to get it to work properly in my setup. > > > > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems -- Seth Galitzer Systems Coordinator Computing and Information Sciences Kansas State University http://www.cis.ksu.edu/~sgsax [email protected] 785-532-7790 _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
