On Wed, Oct 17, 2012 at 8:30 AM, Lonni J Friedman <netll...@gmail.com> wrote: > Greetings, > I'm trying to get an NFS server export to be correctly monitored & > managed by pacemaker, along with pre-existing IP, drbd and filesystem > mounts (which are working correctly). While NFS is up on the primary > node (along with the other services), the monitoring portion keeps > showing up as a failed action, reported as 'not running'. > > Here's my current configuration: > ################ > node farm-ljf0 \ > attributes standby="off" > node farm-ljf1 > primitive ClusterIP ocf:heartbeat:IPaddr2 \ > params ip="10.31.97.100" cidr_netmask="22" nic="eth1" \ > op monitor interval="10s" \ > meta target-role="Started" > primitive FS0 ocf:linbit:drbd \ > params drbd_resource="r0" \ > op monitor interval="10s" role="Master" \ > op monitor interval="30s" role="Slave" > primitive FS0_drbd ocf:heartbeat:Filesystem \ > params device="/dev/drbd0" directory="/mnt/sdb1" fstype="xfs" \ > meta target-role="Started" > primitive FS0_nfs systemd:nfs-server \ > op monitor interval="10s" \ > meta target-role="Started" > group g_services ClusterIP FS0_drbd FS0_nfs > ms FS0_Clone FS0 \ > meta master-max="1" master-node-max="1" clone-max="2" > clone-node-max="1" notify="true" > colocation fs0_on_drbd inf: g_services FS0_Clone:Master > order FS0_drbd-after-FS0 inf: FS0_Clone:promote g_services:start > property $id="cib-bootstrap-options" \ > dc-version="1.1.8-2.fc16-394e906" \ > cluster-infrastructure="openais" \ > expected-quorum-votes="2" \ > stonith-enabled="false" \ > no-quorum-policy="ignore" > ################ > > Here's the output from 'crm status' > ################ > Last updated: Tue Oct 16 14:26:22 2012 > Last change: Tue Oct 16 14:23:18 2012 via cibadmin on farm-ljf1 > Stack: openais > Current DC: farm-ljf1 - partition with quorum > Version: 1.1.8-2.fc16-394e906 > 2 Nodes configured, 2 expected votes > 5 Resources configured. > > > Online: [ farm-ljf0 farm-ljf1 ] > > Master/Slave Set: FS0_Clone [FS0] > Masters: [ farm-ljf1 ] > Slaves: [ farm-ljf0 ] > Resource Group: g_services > ClusterIP (ocf::heartbeat:IPaddr2): Started farm-ljf1 > FS0_drbd (ocf::heartbeat:Filesystem): Started farm-ljf1 > FS0_nfs (systemd:nfs-server): Started farm-ljf1 > > Failed actions: > FS0_nfs_monitor_10000 (node=farm-ljf1, call=54357, rc=7, > status=complete): not running > FS0_nfs_monitor_10000 (node=farm-ljf0, call=131365, rc=7, > status=complete): not running > ################ > > When I check the cluster log, I'm seeing a bunch of this stuff:
Your logs start too late I'm afraid. We need the earlier entries that show the job FS0_nfs_monitor_10000 failing. Be sure to also check the system log file, since that will hopefully have some information directly from systemd and/or nfs-server > ############# > Oct 16 14:23:17 [924] farm-ljf0 attrd: notice: > attrd_trigger_update: Sending flush op to all hosts for: > fail-count-FS0_nfs (11939) > Oct 16 14:23:17 [924] farm-ljf0 attrd: notice: > attrd_trigger_update: Sending flush op to all hosts for: > probe_complete (true) > Oct 16 14:23:17 [924] farm-ljf0 attrd: notice: > attrd_ais_dispatch: Update relayed from farm-ljf1 > Oct 16 14:23:17 [924] farm-ljf0 attrd: notice: > attrd_trigger_update: Sending flush op to all hosts for: > fail-count-FS0_nfs (11940) > Oct 16 14:23:17 [924] farm-ljf0 attrd: notice: > attrd_perform_update: Sent update 25471: fail-count-FS0_nfs=11940 > Oct 16 14:23:17 [924] farm-ljf0 attrd: notice: > attrd_ais_dispatch: Update relayed from farm-ljf1 > Oct 16 14:23:20 [923] farm-ljf0 lrmd: info: > cancel_recurring_action: Cancelling operation FS0_nfs_status_10000 > Oct 16 14:23:20 [926] farm-ljf0 crmd: info: > process_lrm_event: LRM operation FS0_nfs_monitor_10000 (call=131365, > status=1, cib-update=0, confirmed=false) Cancelled > Oct 16 14:23:20 [923] farm-ljf0 lrmd: info: > systemd_unit_exec_done: Call to stop passed: type '(o)' > /org/freedesktop/systemd1/job/1062961 > Oct 16 14:23:20 [926] farm-ljf0 crmd: notice: > process_lrm_event: LRM operation FS0_nfs_stop_0 (call=131369, rc=0, > cib-update=35842, confirmed=true) ok > ############# > > I'm not sure what any of that means. I'd appreciate some guidance. > > thanks! > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org