Re: [Linux-HA] problem with nfs and exportfs failover

Seth Galitzer Mon, 16 Apr 2012 10:47:51 -0700

Just a quick update.  I set the wait_for_leasetime_on_stop parameter on 
the exportfs resource to false, no it no longer sleeps for 92 sec and 
the switchover is instantaneous.  Now I just need to figure out how to 
disable nfsv4 on the server side and I should be home-free.


Thanks.
Seth

On 04/16/2012 12:42 PM, Seth Galitzer wrote:
> I've been poking at this more over the weekend and this morning.  And
> while your tip about rmtab was useful, it still didn't resolve the
> problem.  I also made sure that my exports were only being
> handled/defined by pacemaker and not by /etc/exports.  Though for the
> cloned nfsserver resource to work, it seems you need an /etc/exports
> file to exist on the server, even if it's empty.
>
> It seems the clue as to what's going on is in this line from the log:
>
> coronado exportfs[20325]: INFO: Sleeping 92 seconds to accommodate for
> NFSv4 lease expiry
>
> If I bump up the timeout for the exportfs resource to 95 sec, then after
> the very long timeout, it switches over correctly.  So while this is a
> working solution to the problem, a 95 sec timeout is a little long for
> my personal comfort on a live and active fileserver.  Any idea what is
> instigating this timeout?  Is is exportfs (looks that way from the log
> entry), nfsd, or pacemaker?  If pacemaker, then where can I reduce or
> remove this?
>
> I've been looking at disabling nfsv4 entirely on this server, as I don't
> really need it, but haven't found a solution that works yet.  Tried the
> suggestion in this thread, but it seems to be for mounts, not nfsd, and
> still doesn't help:
> http://lists.debian.org/debian-user/2011/11/msg01585.html
>
> Though I have found that v4 is being loaded on one host but not the
> other.  So if I can find what's different, I may be able to make that work.
>
> coronado:~# rpcinfo -u localhost nfs
> program 100003 version 2 ready and waiting
> program 100003 version 3 ready and waiting
> program 100003 version 4 ready and waiting
>
> cascadia:~# rpcinfo -u localhost nfs
> program 100003 version 2 ready and waiting
> program 100003 version 3 ready and waiting
>
> Any further suggestions are welcome.  I'll keep poking until I find a
> solution.
>
> Thanks.
> Seth
>
> On 04/16/2012 11:49 AM, William Seligman wrote:
>> On 4/14/12 5:55 AM, emmanuel segura wrote:
>>> Maybe the problem it's the primitive nfsserver lsb:nfs-kernel-server, i
>>> think this primitive was stoped befoure exportfs-admin
>>> ocf:heartbeat:exportfs
>>>
>>> And if i rember the lsb:nfs-kernel-server and exportfs agent does the same
>>> thing
>>>
>>> the first use the os scripts and the second the cluster agents
>>
>> Now that Emmanuel has reminded me, I'll offer two more tips based on advice 
>> he's
>> given me in the past:
>>
>> - You can deal with issue he raises directly by putting additional 
>> constraints
>> in your setup, something like:
>>
>> colocation fs-homes-nfsserver inf: group-homes clone-nfsserver
>> order nfssserver-before-homes inf: clone-nfsserver group-homes
>>
>> That will make sure that all the group-homes resources (including
>> exportfs-admin) will not be run unless an instance of nfsserver is already
>> running on that node.
>>
>> - There's a more fundamental question: Why are you placing the start/stop of
>> your NFS server on both nodes under pacemaker control? Why not have the NFS
>> server start at system startup on each node?
>>
>> The only reason I see for putting NFS under Pacemaker control is if there are
>> entries in your /etc/exports file (or the Debian equivalent) that won't work
>> unless other Pacemaker-controlled resources are running, such as DRBD. If 
>> that's
>> the case, you're better off controlling them with Pacemaker exportfs 
>> resources,
>> the same as you're doing with exportfs-admin, instead of /etc/exports 
>> entries.
>>
>>> Il giorno 14 aprile 2012 01:50, William Seligman<
>>> [email protected]>   ha scritto:
>>>
>>>> On 4/13/12 7:18 PM, William Seligman wrote:
>>>>> On 4/13/12 6:42 PM, Seth Galitzer wrote:
>>>>>> In attempting to build a nice clean config, I'm now in a state where
>>>>>> exportfs never starts.  It always times out and errors.
>>>>>>
>>>>>> crm config show is pasted here: http://pastebin.com/cKFFL0Xf
>>>>>> syslog after an attempted restart here: http://pastebin.com/CHdF21M4
>>>>>>
>>>>>> Only IPs have been edited.
>>>>>
>>>>> It's clear that your exportfs resource is timing out for the admin
>>>> resource.
>>>>>
>>>>> I'm no expert, but here are some "stupid exportfs tricks" to try:
>>>>>
>>>>> - Check your /etc/exports file (or whatever the equivalent is in Debian;
>>>>> "man exportfs" will tell you) on both nodes. Make sure you're not already
>>>>> exporting the directory when the NFS server starts.
>>>>>
>>>>> - Take out the exportfs-admin resource. Then try doing things manually:
>>>>>
>>>>> # exportfs x.x.x.0/24:/exports/admin
>>>>>
>>>>> Assuming that works, then look at the output of just
>>>>>
>>>>> # exportfs
>>>>>
>>>>> The clientspec reported by exportfs has to match the clientspec you put
>>>>> into the resource exactly. If exportfs is canonicalizing or reporting the
>>>>> clientspec differently, the exportfs monitor won't work. If this is the
>>>>> case, change the clientspec parameter in exportfs-admin to match.
>>>>>
>>>>> If the output of exportfs has any results that span more than one line,
>>>>> then you've got the problem that the patch I referred you to (quoted
>>>>> below) is supposed to fix. You'll have to apply the patch to your
>>>>> exportfs resource.
>>>>
>>>> Wait a second; I completely forgot about this thread that I started:
>>>>
>>>> <http://www.gossamer-threads.com/lists/linuxha/users/78585>
>>>>
>>>> The solution turned out to be to remove the .rmtab files from the
>>>> directories I was exporting, deleting&   touching /var/lib/nfs/rmtab 
>>>> (you'll
>>>> have to look up the Debian location), and adding rmtab_backup="none" to all
>>>> my exportfs resources.
>>>>
>>>> Hopefully there's a solution for you in there somewhere!
>>>>
>>>>>> On 04/13/2012 01:51 PM, William Seligman wrote:
>>>>>>> On 4/13/12 12:38 PM, Seth Galitzer wrote:
>>>>>>>> I'm working through this howto doc:
>>>>>>>> http://www.linbit.com/fileadmin/tech-guides/ha-nfs.pdf
>>>>>>>> and am stuck at section 4.4.  When I put the primary node in standby, 
>>>>>>>> it
>>>>>>>> seems that NFS never releases the export, so it can't shut down, and
>>>>>>>> thus can't get started on the secondary node.  Everything up to that
>>>>>>>> point in the doc works fine and fails over correctly.  But once I add
>>>>>>>> the exportfs resource, it fails.  I'm running this on debian wheezy 
>>>>>>>> with
>>>>>>>> the included standard packages, not custom.
>>>>>>>>
>>>>>>>> Any suggestions?  I'd be happy to post configs and logs if requested.
>>>>>>>
>>>>>>> Yes, please post the output of "crm configure show", the output of
>>>>>>> "exportfs" while the resource is running properly, and the relevant
>>>>>>> sections of your log file. I suggest using pastebin.com, to keep
>>>>>>> mailboxes filling up with walls of text.
>>>>>>>
>>>>>>> In case you haven't seen this thread already, you might want to take a 
>>>>>>> look:
>>>>>>>
>>>>>>> <http://www.gossamer-threads.com/lists/linuxha/dev/77166>
>>>>>>>
>>>>>>> And the resulting commit:
>>>>>>> <https://github.com/ClusterLabs/resource-agents/commit/5b0bf96e77ed3c4e179c8b4c6a5ffd4709f8fdae>
>>>>>>>
>>>>>>> (Links courtesy of Lars Ellenberg.)
>>>>>>>
>>>>>>> The problem and patch discussed in those links doesn't quite match
>>>>>>> what you describe. I mention it because I had to patch my exportfs
>>>>>>> resource (in /usr/lib/ocf/resource.d/heartbeat/exportfs on my RHEL
>>>>>>> systems) to get it to work properly in my setup.
>>
>>
>>
>> _______________________________________________
>> Linux-HA mailing list
>> [email protected]
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> See also: http://linux-ha.org/ReportingProblems
>

-- 
Seth Galitzer
Systems Coordinator
Computing and Information Sciences
Kansas State University
http://www.cis.ksu.edu/~sgsax
[email protected]
785-532-7790
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] problem with nfs and exportfs failover

Reply via email to