Hi, On Fri, May 30, 2014 at 12:17:00PM +0100, Stuart Taylor wrote: > Hi > > I wonder if anyone on the list can help me - I’m new to Pacemaker so > apologies if I’m posting in the wrong place. > > I have a four-node cluster running Pacemaker 1.1.10 with Corosync 1.4.1 on > CentOS 6.4. Resource-wise I have eight Lustre storage targets on an iSCSI > SAN - two each colocated with a single heartbeat IP address on each node. I > have redundant Corosync rings and Stonith is configured, and failover in > general works very well. > > My problem is that three of the storage targets refuse to mount via Pacemaker > on particular nodes, for no particular reason I can identify. These > resources won’t start on the nodes they’re configured to in the constraints - > which is fine if all nodes are up, but not if certain nodes fail. > > If I stop the resources I can manually mount the targets on the node without > any problem - so it seems to be a Pacemaker, rather than filesystem problem. > > My resources look like this: http://pastebin.com/qQ1BR1yW and constraints > like this: http://pastebin.com/4w85MWUV > > crm_mon -f gives the following output: > > Last updated: Fri May 30 12:02:59 2014 > Last change: Fri May 30 12:02:38 2014 via crm_resource on oss-02 > Stack: classic openais (with plugin) > Current DC: oss-02 - partition with quorum > Version: 1.1.10-14.el6_5.3-368c726 > 4 Nodes configured, 4 expected votes > 16 Resources configured > > > Online: [ oss-01 oss-02 oss-03 oss-04 ] > > ost-01 (ocf::heartbeat:Filesystem): Started oss-01 > ost-02 (ocf::heartbeat:Filesystem): Started oss-02 > stonith-oss-01 (stonith:fence_ipmilan): Started oss-03 > stonith-oss-02 (stonith:fence_ipmilan): Started oss-04 > ost-03 (ocf::heartbeat:Filesystem): Started oss-04 > stonith-oss-03 (stonith:fence_ipmilan): Started oss-01 > ost-05 (ocf::heartbeat:Filesystem): Started oss-01 > ost-06 (ocf::heartbeat:Filesystem): Started oss-02 > ost-07 (ocf::heartbeat:Filesystem): Started oss-04 > ost-04 (ocf::heartbeat:Filesystem): Started oss-03 > ost-08 (ocf::heartbeat:Filesystem): Started oss-03 > oss-01-hb (ocf::heartbeat:IPaddr2): Started oss-01 > oss-02-hb (ocf::heartbeat:IPaddr2): Started oss-02 > oss-03-hb (ocf::heartbeat:IPaddr2): Started oss-04 > oss-04-hb (ocf::heartbeat:IPaddr2): Started oss-03 > stonith-oss-04 (stonith:fence_ipmilan): Started oss-02 > > Migration summary: > * Node oss-01: > * Node oss-02: > * Node oss-04: > ost-04: migration-threshold=1000000 fail-count=1000000 last-failure='Fri > May 30 11:25:11 2014' > ost-08: migration-threshold=1000000 fail-count=1000000 last-failure='Fri > May 30 11:25:11 2014' > * Node oss-03: > ost-03: migration-threshold=1000000 fail-count=1000000 last-failure='Fri > May 30 10:47:02 2014' > > ost-03 is supposed to mount on oss-03, and ost-04 & ost-08 on oss-04, but > they fail to do so and the colo-ed IP resources are therefore swapped between > oss-03 and oss-04. > > Log entries typically look like this, which doesn’t give me much to go on: > > May 30 11:25:11 oss-04 lrmd[2179]: notice: operation_finished: > ost-08_start_0:2994:stderr [ mount.lustre: mount /dev/sdi at /lustre/ost-08 > failed: Unknown error 524 ]
The mount command obviously failed. Whatever the difference may be between you mounting the filesystem by hand and the Filesystem RA. And whatever error 524 means. > Does anyone know / can anyone suggest how I might debug why Pacemaker can’t > mount these targets? Assuming you have recent enough resource-agents and crmsh, you can trace the Filesystem RA, say: # crm resource trace ost-08 start This should make pacemaker try to start ost-08 again: # crm resource cleanup ost-08 Then look for the trace file in /var/lib/heartbeat/trace_ra. Alternatively, you can add 'set -x' somewhere in the Filesystem RA, then look at the logs. Thanks, Dejan > > Many thanks > Stuart > > Stuart Taylor > System Administrator > Edinburgh Genomics > > Web: http://genomics.ed.ac.uk/ > Tel: 0131 651 7403 > > > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org