Re: [Linux-HA] problem with nfs and exportfs failover

Seth Galitzer Mon, 16 Apr 2012 10:42:24 -0700

I've been poking at this more over the weekend and this morning.  And 
while your tip about rmtab was useful, it still didn't resolve the 
problem.  I also made sure that my exports were only being 
handled/defined by pacemaker and not by /etc/exports.  Though for the 
cloned nfsserver resource to work, it seems you need an /etc/exports 
file to exist on the server, even if it's empty.


It seems the clue as to what's going on is in this line from the log:

coronado exportfs[20325]: INFO: Sleeping 92 seconds to accommodate for 
NFSv4 lease expiry

If I bump up the timeout for the exportfs resource to 95 sec, then after 
the very long timeout, it switches over correctly.  So while this is a 
working solution to the problem, a 95 sec timeout is a little long for 
my personal comfort on a live and active fileserver.  Any idea what is 
instigating this timeout?  Is is exportfs (looks that way from the log 
entry), nfsd, or pacemaker?  If pacemaker, then where can I reduce or 
remove this?

I've been looking at disabling nfsv4 entirely on this server, as I don't 
really need it, but haven't found a solution that works yet.  Tried the 
suggestion in this thread, but it seems to be for mounts, not nfsd, and 
still doesn't help:
http://lists.debian.org/debian-user/2011/11/msg01585.html

Though I have found that v4 is being loaded on one host but not the 
other.  So if I can find what's different, I may be able to make that work.

coronado:~# rpcinfo -u localhost nfs
program 100003 version 2 ready and waiting
program 100003 version 3 ready and waiting
program 100003 version 4 ready and waiting

cascadia:~# rpcinfo -u localhost nfs
program 100003 version 2 ready and waiting
program 100003 version 3 ready and waiting

Any further suggestions are welcome.  I'll keep poking until I find a 
solution.

Thanks.
Seth

On 04/16/2012 11:49 AM, William Seligman wrote:
> On 4/14/12 5:55 AM, emmanuel segura wrote:
>> Maybe the problem it's the primitive nfsserver lsb:nfs-kernel-server, i
>> think this primitive was stoped befoure exportfs-admin
>> ocf:heartbeat:exportfs
>>
>> And if i rember the lsb:nfs-kernel-server and exportfs agent does the same
>> thing
>>
>> the first use the os scripts and the second the cluster agents
>
> Now that Emmanuel has reminded me, I'll offer two more tips based on advice 
> he's
> given me in the past:
>
> - You can deal with issue he raises directly by putting additional constraints
> in your setup, something like:
>
> colocation fs-homes-nfsserver inf: group-homes clone-nfsserver
> order nfssserver-before-homes inf: clone-nfsserver group-homes
>
> That will make sure that all the group-homes resources (including
> exportfs-admin) will not be run unless an instance of nfsserver is already
> running on that node.
>
> - There's a more fundamental question: Why are you placing the start/stop of
> your NFS server on both nodes under pacemaker control? Why not have the NFS
> server start at system startup on each node?
>
> The only reason I see for putting NFS under Pacemaker control is if there are
> entries in your /etc/exports file (or the Debian equivalent) that won't work
> unless other Pacemaker-controlled resources are running, such as DRBD. If 
> that's
> the case, you're better off controlling them with Pacemaker exportfs 
> resources,
> the same as you're doing with exportfs-admin, instead of /etc/exports entries.
>
>> Il giorno 14 aprile 2012 01:50, William Seligman<
>> [email protected]>  ha scritto:
>>
>>> On 4/13/12 7:18 PM, William Seligman wrote:
>>>> On 4/13/12 6:42 PM, Seth Galitzer wrote:
>>>>> In attempting to build a nice clean config, I'm now in a state where
>>>>> exportfs never starts.  It always times out and errors.
>>>>>
>>>>> crm config show is pasted here: http://pastebin.com/cKFFL0Xf
>>>>> syslog after an attempted restart here: http://pastebin.com/CHdF21M4
>>>>>
>>>>> Only IPs have been edited.
>>>>
>>>> It's clear that your exportfs resource is timing out for the admin
>>> resource.
>>>>
>>>> I'm no expert, but here are some "stupid exportfs tricks" to try:
>>>>
>>>> - Check your /etc/exports file (or whatever the equivalent is in Debian;
>>>> "man exportfs" will tell you) on both nodes. Make sure you're not already
>>>> exporting the directory when the NFS server starts.
>>>>
>>>> - Take out the exportfs-admin resource. Then try doing things manually:
>>>>
>>>> # exportfs x.x.x.0/24:/exports/admin
>>>>
>>>> Assuming that works, then look at the output of just
>>>>
>>>> # exportfs
>>>>
>>>> The clientspec reported by exportfs has to match the clientspec you put
>>>> into the resource exactly. If exportfs is canonicalizing or reporting the
>>>> clientspec differently, the exportfs monitor won't work. If this is the
>>>> case, change the clientspec parameter in exportfs-admin to match.
>>>>
>>>> If the output of exportfs has any results that span more than one line,
>>>> then you've got the problem that the patch I referred you to (quoted
>>>> below) is supposed to fix. You'll have to apply the patch to your
>>>> exportfs resource.
>>>
>>> Wait a second; I completely forgot about this thread that I started:
>>>
>>> <http://www.gossamer-threads.com/lists/linuxha/users/78585>
>>>
>>> The solution turned out to be to remove the .rmtab files from the
>>> directories I was exporting, deleting&  touching /var/lib/nfs/rmtab (you'll
>>> have to look up the Debian location), and adding rmtab_backup="none" to all
>>> my exportfs resources.
>>>
>>> Hopefully there's a solution for you in there somewhere!
>>>
>>>>> On 04/13/2012 01:51 PM, William Seligman wrote:
>>>>>> On 4/13/12 12:38 PM, Seth Galitzer wrote:
>>>>>>> I'm working through this howto doc:
>>>>>>> http://www.linbit.com/fileadmin/tech-guides/ha-nfs.pdf
>>>>>>> and am stuck at section 4.4.  When I put the primary node in standby, it
>>>>>>> seems that NFS never releases the export, so it can't shut down, and
>>>>>>> thus can't get started on the secondary node.  Everything up to that
>>>>>>> point in the doc works fine and fails over correctly.  But once I add
>>>>>>> the exportfs resource, it fails.  I'm running this on debian wheezy with
>>>>>>> the included standard packages, not custom.
>>>>>>>
>>>>>>> Any suggestions?  I'd be happy to post configs and logs if requested.
>>>>>>
>>>>>> Yes, please post the output of "crm configure show", the output of
>>>>>> "exportfs" while the resource is running properly, and the relevant
>>>>>> sections of your log file. I suggest using pastebin.com, to keep
>>>>>> mailboxes filling up with walls of text.
>>>>>>
>>>>>> In case you haven't seen this thread already, you might want to take a 
>>>>>> look:
>>>>>>
>>>>>> <http://www.gossamer-threads.com/lists/linuxha/dev/77166>
>>>>>>
>>>>>> And the resulting commit:
>>>>>> <https://github.com/ClusterLabs/resource-agents/commit/5b0bf96e77ed3c4e179c8b4c6a5ffd4709f8fdae>
>>>>>>
>>>>>> (Links courtesy of Lars Ellenberg.)
>>>>>>
>>>>>> The problem and patch discussed in those links doesn't quite match
>>>>>> what you describe. I mention it because I had to patch my exportfs
>>>>>> resource (in /usr/lib/ocf/resource.d/heartbeat/exportfs on my RHEL
>>>>>> systems) to get it to work properly in my setup.
>
>
>
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems

-- 
Seth Galitzer
Systems Coordinator
Computing and Information Sciences
Kansas State University
http://www.cis.ksu.edu/~sgsax
[email protected]
785-532-7790
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] problem with nfs and exportfs failover

Reply via email to