Hi Heikki, just some comments for helping yourself.
1) The second output of crm_mon show a resource IP_database which is not shown in the initial crm_mon output and also not in the config. => Reduce your problem/config to the minimum being reproducible. 2) Enable logging and look out which node is the DC. There in the logs you find many many informations showing what is going on. Hint: Open a terminal session with an opened tail -f logfile. Watch it while inserting commands. You'll get used to it. 3) The shown status of a drbd resource (crm_mon) doesn't show you all informations of the drbd devices. Have a look at drbd-overview on both nodes. (e.g. syncing status). 4) This setup CRIES for stonithing. Even in a test environment. When stonith happens (this is what you see immediately) you know something went wrong. This is a good indicator for errors in agents or in the config. Believe me, as tedious stonithing is the valuable it is for getting hints for bad cluster state. On virtual machines stonithing is not as painful as on real servers. 5) Is the drbd fencing script enabled? If yes, in certain circumstances -INF rules are inserted to deny promoting of "wrong" nodes. You should grep for them 'cibadmin -Q | grep <resname>' 6) crm_simulate -L -v gives you an output of the scores of the resources on each node. I really don't know how to read it exactly (Is there a documentation of that anywhere?), but it gives you a hint where to look at, when resources don't start. Especially the aggregation of stickiness values in groups are sometimes misleading. 7) Sometimes behaviour of pacemaker changed and it is possible that you hit a bug. But this hard to find out. Possibility: Check a newer version. Hope this helps. Best regards Andreas Mock -----Ursprüngliche Nachricht----- Von: Heikki Manninen [mailto:h...@iki.fi] Gesendet: Donnerstag, 5. September 2013 14:08 An: pacemaker@oss.clusterlabs.org Betreff: [Pacemaker] Resource ordering/colocating question (DRBD + LVM + FS) Hello, I'm having a bit of a problem understanding what's going on with my simple two-node demo cluster here. My resources come up correctly after restarting the whole cluster but the LVM and Filesystem resources fail to start after a single node restart or standby/unstandby (after node comes back online - why do they even stop/start after the second node comes back?). OS: CentOS 6.4 (cman stack) Pacemaker: pacemaker-1.1.8-7.el6.x86_64 DRBD: drbd84-utils-8.4.3-1.el6.elrepo.x86_64 Everything is configured using: pcs-0.9.26-10.el6_4.1.noarch Two DRBD resources configured and working: data01 & data02 Two nodes: pgdbsrv01.cl1.local & pgdbsrv02.cl1.local Configuration: node pgdbsrv01.cl1.local node pgdbsrv02.cl1.local primitive DRBD_data01 ocf:linbit:drbd \ params drbd_resource="data01" \ op monitor interval="30s" primitive DRBD_data02 ocf:linbit:drbd \ params drbd_resource="data02" \ op monitor interval="30s" primitive FS_data01 ocf:heartbeat:Filesystem \ params device="/dev/mapper/vgdata01-lvdata01" directory="/data01" fstype="ext4" \ op monitor interval="30s" primitive FS_data02 ocf:heartbeat:Filesystem \ params device="/dev/mapper/vgdata02-lvdata02" directory="/data02" fstype="ext4" \ op monitor interval="30s" primitive LVM_vgdata01 ocf:heartbeat:LVM \ params volgrpname="vgdata01" exclusive="true" \ op monitor interval="30s" primitive LVM_vgdata02 ocf:heartbeat:LVM \ params volgrpname="vgdata02" exclusive="true" \ op monitor interval="30s" group GRP_data01 LVM_vgdata01 FS_data01 group GRP_data02 LVM_vgdata02 FS_data02 ms DRBD_ms_data01 DRBD_data01 \ meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" ms DRBD_ms_data02 DRBD_data02 \ meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" colocation colocation-GRP_data01-DRBD_ms_data01-INFINITY inf: GRP_data01 DRBD_ms_data01:Master colocation colocation-GRP_data02-DRBD_ms_data02-INFINITY inf: GRP_data02 DRBD_ms_data02:Master order order-DRBD_data01-GRP_data01-mandatory : DRBD_data01:promote GRP_data01:start order order-DRBD_data02-GRP_data02-mandatory : DRBD_data02:promote GRP_data02:start property $id="cib-bootstrap-options" \ dc-version="1.1.8-7.el6-394e906" \ cluster-infrastructure="cman" \ stonith-enabled="false" \ no-quorum-policy="ignore" \ migration-threshold="1" rsc_defaults $id="rsc_defaults-options" \ resource-stickiness="100" 1) After starting the cluster, everything runs happily: Last updated: Tue Sep 3 00:11:13 2013 Last change: Tue Sep 3 00:05:15 2013 via cibadmin on pgdbsrv01.cl1.local Stack: cman Current DC: pgdbsrv02.cl1.local - partition with quorum Version: 1.1.8-7.el6-394e906 2 Nodes configured, unknown expected votes 9 Resources configured. Online: [ pgdbsrv01.cl1.local pgdbsrv02.cl1.local ] Full list of resources: Master/Slave Set: DRBD_ms_data01 [DRBD_data01] Masters: [ pgdbsrv01.cl1.local ] Slaves: [ pgdbsrv02.cl1.local ] Master/Slave Set: DRBD_ms_data02 [DRBD_data02] Masters: [ pgdbsrv01.cl1.local ] Slaves: [ pgdbsrv02.cl1.local ] Resource Group: GRP_data01 LVM_vgdata01 (ocf::heartbeat:LVM): Started pgdbsrv01.cl1.local FS_data01 (ocf::heartbeat:Filesystem): Started pgdbsrv01.cl1.local Resource Group: GRP_data02 LVM_vgdata02 (ocf::heartbeat:LVM): Started pgdbsrv01.cl1.local FS_data02 (ocf::heartbeat:Filesystem): Started pgdbsrv01.cl1.local 2) Putting node #1 to standby mode - after which everything runs happily on node pgdbsrv02.cl1.local # pcs cluster standby pgdbsrv01.cl1.local # pcs status Last updated: Tue Sep 3 00:16:01 2013 Last change: Tue Sep 3 00:15:55 2013 via crm_attribute on pgdbsrv02.cl1.local Stack: cman Current DC: pgdbsrv02.cl1.local - partition with quorum Version: 1.1.8-7.el6-394e906 2 Nodes configured, unknown expected votes 9 Resources configured. Node pgdbsrv01.cl1.local: standby Online: [ pgdbsrv02.cl1.local ] Full list of resources: IP_database (ocf::heartbeat:IPaddr2): Started pgdbsrv02.cl1.local Master/Slave Set: DRBD_ms_data01 [DRBD_data01] Masters: [ pgdbsrv02.cl1.local ] Stopped: [ DRBD_data01:1 ] Master/Slave Set: DRBD_ms_data02 [DRBD_data02] Masters: [ pgdbsrv02.cl1.local ] Stopped: [ DRBD_data02:1 ] Resource Group: GRP_data01 LVM_vgdata01 (ocf::heartbeat:LVM): Started pgdbsrv02.cl1.local FS_data01 (ocf::heartbeat:Filesystem): Started pgdbsrv02.cl1.local Resource Group: GRP_data02 LVM_vgdata02 (ocf::heartbeat:LVM): Started pgdbsrv02.cl1.local FS_data02 (ocf::heartbeat:Filesystem): Started pgdbsrv02.cl1.local 3) Putting node #1 back online - it seems that all the resources stop (?) and then DRBD gets promoted successfully on node #2 but LVM and FS resources never start # pcs cluster unstandby pgdbsrv01.cl1.local # pcs status Last updated: Tue Sep 3 00:17:00 2013 Last change: Tue Sep 3 00:16:56 2013 via crm_attribute on pgdbsrv02.cl1.local Stack: cman Current DC: pgdbsrv02.cl1.local - partition with quorum Version: 1.1.8-7.el6-394e906 2 Nodes configured, unknown expected votes 9 Resources configured. Online: [ pgdbsrv01.cl1.local pgdbsrv02.cl1.local ] Full list of resources: IP_database (ocf::heartbeat:IPaddr2): Started pgdbsrv02.cl1.local Master/Slave Set: DRBD_ms_data01 [DRBD_data01] Masters: [ pgdbsrv02.cl1.local ] Slaves: [ pgdbsrv01.cl1.local ] Master/Slave Set: DRBD_ms_data02 [DRBD_data02] Masters: [ pgdbsrv02.cl1.local ] Slaves: [ pgdbsrv01.cl1.local ] Resource Group: GRP_data01 LVM_vgdata01 (ocf::heartbeat:LVM): Stopped FS_data01 (ocf::heartbeat:Filesystem): Stopped Resource Group: GRP_data02 LVM_vgdata02 (ocf::heartbeat:LVM): Stopped FS_data02 (ocf::heartbeat:Filesystem): Stopped Any ideas why this is happening/what could be wrong in the resource configuration? The same thing happens when testing the situation with the resources located vice-versa in the beginning. Also, if I stop & start one of the nodes, same thing happens once the node gets back online. -- Heikki Manninen <h...@iki.fi> _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org