Hello, I'm having a bit of a problem understanding what's going on with my simple two-node demo cluster here. My resources come up correctly after restarting the whole cluster but the LVM and Filesystem resources fail to start after a single node restart or standby/unstandby (after node comes back online - why do they even stop/start after the second node comes back?).
OS: CentOS 6.4 (cman stack) Pacemaker: pacemaker-1.1.8-7.el6.x86_64 DRBD: drbd84-utils-8.4.3-1.el6.elrepo.x86_64 Everything is configured using: pcs-0.9.26-10.el6_4.1.noarch Two DRBD resources configured and working: data01 & data02 Two nodes: pgdbsrv01.cl1.local & pgdbsrv02.cl1.local Configuration: node pgdbsrv01.cl1.local node pgdbsrv02.cl1.local primitive DRBD_data01 ocf:linbit:drbd \ params drbd_resource="data01" \ op monitor interval="30s" primitive DRBD_data02 ocf:linbit:drbd \ params drbd_resource="data02" \ op monitor interval="30s" primitive FS_data01 ocf:heartbeat:Filesystem \ params device="/dev/mapper/vgdata01-lvdata01" directory="/data01" fstype="ext4" \ op monitor interval="30s" primitive FS_data02 ocf:heartbeat:Filesystem \ params device="/dev/mapper/vgdata02-lvdata02" directory="/data02" fstype="ext4" \ op monitor interval="30s" primitive LVM_vgdata01 ocf:heartbeat:LVM \ params volgrpname="vgdata01" exclusive="true" \ op monitor interval="30s" primitive LVM_vgdata02 ocf:heartbeat:LVM \ params volgrpname="vgdata02" exclusive="true" \ op monitor interval="30s" group GRP_data01 LVM_vgdata01 FS_data01 group GRP_data02 LVM_vgdata02 FS_data02 ms DRBD_ms_data01 DRBD_data01 \ meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" ms DRBD_ms_data02 DRBD_data02 \ meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" colocation colocation-GRP_data01-DRBD_ms_data01-INFINITY inf: GRP_data01 DRBD_ms_data01:Master colocation colocation-GRP_data02-DRBD_ms_data02-INFINITY inf: GRP_data02 DRBD_ms_data02:Master order order-DRBD_data01-GRP_data01-mandatory : DRBD_data01:promote GRP_data01:start order order-DRBD_data02-GRP_data02-mandatory : DRBD_data02:promote GRP_data02:start property $id="cib-bootstrap-options" \ dc-version="1.1.8-7.el6-394e906" \ cluster-infrastructure="cman" \ stonith-enabled="false" \ no-quorum-policy="ignore" \ migration-threshold="1" rsc_defaults $id="rsc_defaults-options" \ resource-stickiness="100" 1) After starting the cluster, everything runs happily: Last updated: Tue Sep 3 00:11:13 2013 Last change: Tue Sep 3 00:05:15 2013 via cibadmin on pgdbsrv01.cl1.local Stack: cman Current DC: pgdbsrv02.cl1.local - partition with quorum Version: 1.1.8-7.el6-394e906 2 Nodes configured, unknown expected votes 9 Resources configured. Online: [ pgdbsrv01.cl1.local pgdbsrv02.cl1.local ] Full list of resources: Master/Slave Set: DRBD_ms_data01 [DRBD_data01] Masters: [ pgdbsrv01.cl1.local ] Slaves: [ pgdbsrv02.cl1.local ] Master/Slave Set: DRBD_ms_data02 [DRBD_data02] Masters: [ pgdbsrv01.cl1.local ] Slaves: [ pgdbsrv02.cl1.local ] Resource Group: GRP_data01 LVM_vgdata01 (ocf::heartbeat:LVM): Started pgdbsrv01.cl1.local FS_data01 (ocf::heartbeat:Filesystem): Started pgdbsrv01.cl1.local Resource Group: GRP_data02 LVM_vgdata02 (ocf::heartbeat:LVM): Started pgdbsrv01.cl1.local FS_data02 (ocf::heartbeat:Filesystem): Started pgdbsrv01.cl1.local 2) Putting node #1 to standby mode - after which everything runs happily on node pgdbsrv02.cl1.local # pcs cluster standby pgdbsrv01.cl1.local # pcs status Last updated: Tue Sep 3 00:16:01 2013 Last change: Tue Sep 3 00:15:55 2013 via crm_attribute on pgdbsrv02.cl1.local Stack: cman Current DC: pgdbsrv02.cl1.local - partition with quorum Version: 1.1.8-7.el6-394e906 2 Nodes configured, unknown expected votes 9 Resources configured. Node pgdbsrv01.cl1.local: standby Online: [ pgdbsrv02.cl1.local ] Full list of resources: IP_database (ocf::heartbeat:IPaddr2): Started pgdbsrv02.cl1.local Master/Slave Set: DRBD_ms_data01 [DRBD_data01] Masters: [ pgdbsrv02.cl1.local ] Stopped: [ DRBD_data01:1 ] Master/Slave Set: DRBD_ms_data02 [DRBD_data02] Masters: [ pgdbsrv02.cl1.local ] Stopped: [ DRBD_data02:1 ] Resource Group: GRP_data01 LVM_vgdata01 (ocf::heartbeat:LVM): Started pgdbsrv02.cl1.local FS_data01 (ocf::heartbeat:Filesystem): Started pgdbsrv02.cl1.local Resource Group: GRP_data02 LVM_vgdata02 (ocf::heartbeat:LVM): Started pgdbsrv02.cl1.local FS_data02 (ocf::heartbeat:Filesystem): Started pgdbsrv02.cl1.local 3) Putting node #1 back online - it seems that all the resources stop (?) and then DRBD gets promoted successfully on node #2 but LVM and FS resources never start # pcs cluster unstandby pgdbsrv01.cl1.local # pcs status Last updated: Tue Sep 3 00:17:00 2013 Last change: Tue Sep 3 00:16:56 2013 via crm_attribute on pgdbsrv02.cl1.local Stack: cman Current DC: pgdbsrv02.cl1.local - partition with quorum Version: 1.1.8-7.el6-394e906 2 Nodes configured, unknown expected votes 9 Resources configured. Online: [ pgdbsrv01.cl1.local pgdbsrv02.cl1.local ] Full list of resources: IP_database (ocf::heartbeat:IPaddr2): Started pgdbsrv02.cl1.local Master/Slave Set: DRBD_ms_data01 [DRBD_data01] Masters: [ pgdbsrv02.cl1.local ] Slaves: [ pgdbsrv01.cl1.local ] Master/Slave Set: DRBD_ms_data02 [DRBD_data02] Masters: [ pgdbsrv02.cl1.local ] Slaves: [ pgdbsrv01.cl1.local ] Resource Group: GRP_data01 LVM_vgdata01 (ocf::heartbeat:LVM): Stopped FS_data01 (ocf::heartbeat:Filesystem): Stopped Resource Group: GRP_data02 LVM_vgdata02 (ocf::heartbeat:LVM): Stopped FS_data02 (ocf::heartbeat:Filesystem): Stopped Any ideas why this is happening/what could be wrong in the resource configuration? The same thing happens when testing the situation with the resources located vice-versa in the beginning. Also, if I stop & start one of the nodes, same thing happens once the node gets back online. -- Heikki Manninen <h...@iki.fi> _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org