I am getting rather unexpected behavior when I combine clones, location constraints, and remote nodes in an asymmetric cluster. My cluster is configured to be asymmetric, distinguishing between vmhosts and various sorts of remote nodes. Currently I am running upstream version b6d42ed. I am simplifying my description to avoid confusion, hoping in so doing I don't miss any salient points...
My physical cluster nodes, also the VM hosts, have the attribute "nodetype=vmhost". They also have Infiniband interfaces, which take some time to come up. I don't want my shared file system (which needs IB), or libvirtd (which needs the file system), to come up before IB... So I have this in my configuration: primitive p-watch-ib0 ocf:heartbeat:ethmonitor \ params \ interface="ib0" \ op monitor timeout="100s" interval="10s" clone c-watch-ib0 p-watch-ib0 \ meta interleave="true" # location loc-watch-ib-only-vmhosts c-watch-ib0 \ rule 0: nodetype eq "vmhost" Something broke between upstream versions 0a2570a and c68919f -- the c-watch-ib0 clone never starts. I've found that if I run "crm_resource --force-start -r p-watch-ib0" when IB is running, the ethmonitor-ib0 attribute is not set like it used to be. Oh well, I can set it manually. So let's. We use GPFS for a shared file system, so I have an agent to start it and wait for a file system to mount. It should only run on VM hosts, and only when IB is running. So I have this: primitive p-fs-gpfs ocf:ccni:gpfs \ params \ fspath="/gpfs/lb/utility" \ op monitor timeout="20s" interval="30s" \ op start timeout="180s" \ op stop timeout="120s" clone c-fs-gpfs p-fs-gpfs \ meta interleave="true" location loc-fs-gpfs-needs-ib0 c-fs-gpfs \ rule -inf: not_defined "ethmonitor-ib0" or "ethmonitor-ib0" eq 0 location loc-fs-gpfs-on-vmhosts c-fs-gpfs \ rule 0: nodetype eq "vmhost" That all used to start nicely. Now even if I set the ethmonitor-ib0 attribute, it doesn't. However, I can use "crm_resource --force-start -r p-fs-gpfs" on each of my VM hosts, then issue "crm resource cleanup c-fs-gpfs", and all is well. I can use "crm status" to see something like: Last updated: Tue Oct 22 16:35:43 2013 Last change: Tue Oct 22 15:50:52 2013 via crmd on cvmh01 Stack: cman Current DC: cvmh04 - partition with quorum Version: 1.1.10-19.el6.ccni-b6d42ed 8 Nodes configured 92 Resources configured Online: [ cvmh01 cvmh02 cvmh03 cvmh04 ] fence-cvmh01 (stonith:fence_ipmilan): Started cvmh04 fence-cvmh02 (stonith:fence_ipmilan): Started cvmh01 fence-cvmh03 (stonith:fence_ipmilan): Started cvmh01 fence-cvmh04 (stonith:fence_ipmilan): Started cvmh01 Clone Set: c-fs-gpfs [p-fs-gpfs] Started: [ cvmh01 cvmh02 cvmh03 cvmh04 ] which is what I would expect (other than I expect pacemaker to have started these for me, like it used to). Now I also have clone resources to NFS-mount another file system, and actually do a bind mount out of the GPFS file system, which behave like the GPFS resource -- they used to just work, now I need to use "crm_resource --force-start" and clean up. That finally lets me start libvirtd, using this configuration: primitive p-libvirtd lsb:libvirtd \ op monitor interval="30s" clone c-p-libvirtd p-libvirtd \ meta interleave="true" order o-libvirtd-after-storage inf: \ ( c-fs-libvirt-VM-xcm c-fs-bind-libvirt-VM-cvmh ) \ c-p-libvirtd location loc-libvirtd-on-vmhosts c-p-libvirtd \ rule 0: nodetype eq "vmhost" Of course that used to just work, but now, like the other clones, I need to force-start libvirtd on the VM hosts, and clean up. Once I do that, all my VM resources, which are not clones, just start up like they are supposed to! Several of these are configured as remote nodes, and they have services configured to run in them. But now other strange things happen: Last updated: Tue Oct 22 16:46:29 2013 Last change: Tue Oct 22 15:50:52 2013 via crmd on cvmh01 Stack: cman Current DC: cvmh04 - partition with quorum Version: 1.1.10-19.el6.ccni-b6d42ed 8 Nodes configured 92 Resources configured ContainerNode slurmdb02:vm-slurmdb02: UNCLEAN (offline) Online: [ cvmh01 cvmh02 cvmh03 cvmh04 ] Containers: [ db02:vm-db02 ldap01:vm-ldap01 ldap02:vm-ldap02 ] fence-cvmh01 (stonith:fence_ipmilan): Started cvmh04 fence-cvmh02 (stonith:fence_ipmilan): Started cvmh01 fence-cvmh03 (stonith:fence_ipmilan): Started cvmh01 fence-cvmh04 (stonith:fence_ipmilan): Started cvmh01 Clone Set: c-p-libvirtd [p-libvirtd] p-libvirtd (lsb:libvirtd): FAILED slurmdb02 Started: [ cvmh01 cvmh02 cvmh03 cvmh04 ] Stopped: [ db02 ldap01 ldap02 ] Clone Set: c-watch-ib0 [p-watch-ib0] p-watch-ib0 (ocf::heartbeat:ethmonitor): FAILED slurmdb02 Started: [ cvmh01 cvmh02 cvmh03 cvmh04 ] Stopped: [ db02 ldap01 ldap02 ] Clone Set: c-fs-gpfs [p-fs-gpfs] p-fs-gpfs (ocf::ccni:gpfs): FAILED slurmdb02 Started: [ cvmh01 cvmh02 cvmh03 cvmh04 ] Stopped: [ db02 ldap01 ldap02 ] vm-compute-test (ocf::ccni:xcatVirtualDomain): FAILED [ cvmh04 slurmdb0 2 ] vm-swbuildsl6 (ocf::ccni:xcatVirtualDomain): FAILED slurmdb02 vm-db02 (ocf::ccni:xcatVirtualDomain): Started cvmh01 vm-ldap01 (ocf::ccni:xcatVirtualDomain): Started cvmh02 vm-ldap02 (ocf::ccni:xcatVirtualDomain): Started cvmh03 p-postgres (ocf::heartbeat:pgsql): FAILED [ db02 slurmdb02 ] p-mysql (ocf::heartbeat:mysql): FAILED [ db02 slurmdb02 ] Clone Set: c-fs-share-config-data [fs-share-config-data] fs-share-config-data (ocf::heartbeat:Filesystem): FAILED slurmdb02 Stopped: [ cvmh01 cvmh02 cvmh03 cvmh04 db02 ldap01 ldap02 ] p-mysql-slurm (ocf::heartbeat:mysql): FAILED slurmdb02 p-slurmdbd (ocf::ccni:SlurmDBD): FAILED slurmdb02 Clone Set: c-ldapagent [s-ldapagent] s-ldapagent (ocf::ccni:WrapInitScript): FAILED slurmdb02 Stopped: [ cvmh01 cvmh02 cvmh03 cvmh04 db02 ldap01 ldap02 ] Clone Set: c-ldap [s-ldap] s-ldap (ocf::ccni:WrapInitScript): FAILED slurmdb02 Started: [ ldap01 ldap02 ] Stopped: [ cvmh01 cvmh02 cvmh03 cvmh04 db02 ] Now this is unexpected for a couple of reasons. I do have constraints like: location loc-vm-swbuildsl6 vm-swbuildsl6 \ rule $id="loc-vm-swbuildsl6-rule" 0: nodetype eq vmhost order o-vm-swbuildsl6 inf: c-p-libvirtd vm-swbuildsl6 And it is not the case that slurmdb02 has the vmhost attribute set; using "crm_mon -o -1 -N -A" we see: Node Attributes: * Node cvmh01: + ethmonitor-ib0 : 1 + nodetype : vmhost * Node cvmh02: + ethmonitor-ib0 : 1 + nodetype : vmhost * Node cvmh03: + ethmonitor-ib0 : 1 + nodetype : vmhost * Node cvmh04: + ethmonitor-ib0 : 1 + nodetype : vmhost * Node db02: * Node ldap01: * Node ldap02: * Node slurmdb02: The results are unexpected to me also because I (perhaps naively) wouldn't expect it to show me the new nodes on the "stopped" lines -- I kind of expected a location rule to limit where clones would even be attempted. For example, with the rule limiting c-p-libvirtd to the vmhosts, I don't really expect to be told that the clones are stopped on the remote VM nodes db02, ldap01, and ldap02 (let alone be started on slurmdb02!). Until I wrote this note, even the cloned ldap resource c-ldap needed to be started using force-start. Not sure why this time it started on its own... Perhaps this stack trace in the core dump pacemaker left on one of the VM hosts has a clue? #0 0x00007f121e9ac8e5 in raise () from /lib64/libc.so.6 #1 0x00007f121e9ae0c5 in abort () from /lib64/libc.so.6 #2 0x00007f121e9ea7f7 in __libc_message () from /lib64/libc.so.6 #3 0x00007f121e9f0126 in malloc_printerr () from /lib64/libc.so.6 #4 0x00007f121e9f05ad in malloc_consolidate () from /lib64/libc.so.6 #5 0x00007f121e9f33c5 in _int_malloc () from /lib64/libc.so.6 #6 0x00007f121e9f45e6 in calloc () from /lib64/libc.so.6 #7 0x00007f121e9e91ed in open_memstream () from /lib64/libc.so.6 #8 0x00007f121ea5ebdb in __vsyslog_chk () from /lib64/libc.so.6 #9 0x00007f121ea5f1b3 in __syslog_chk () from /lib64/libc.so.6 #10 0x00007f121e72b9fb in ?? () from /usr/lib64/libqb.so.0 #11 0x00007f121e72a6a2 in qb_log_real_va_ () from /usr/lib64/libqb.so.0 #12 0x00007f121e72a91d in qb_log_real_ () from /usr/lib64/libqb.so.0 #13 0x000000000042e994 in te_rsc_command (graph=0x20c7b40, action=0x23b0c90) at te_actions.c:412 #14 0x0000003a64404019 in initiate_action (graph=0x20c7b40) at graph.c:172 #15 fire_synapse (graph=0x20c7b40) at graph.c:211 #16 run_graph (graph=0x20c7b40) at graph.c:366 #17 0x000000000042f8cd in te_graph_trigger (user_data=<value optimized out>) at te_utils.c:331 #18 0x0000003a6202b283 in crm_trigger_dispatch (source=<value optimized out>, callback=<value optimized out>, userdata=<value optimized out>) at mainloop.c:105 #19 0x00000038b3c38f0e in g_main_context_dispatch () from /lib64/libglib-2.0.so.0 #20 0x00000038b3c3c938 in ?? () from /lib64/libglib-2.0.so.0 #21 0x00000038b3c3cd55 in g_main_loop_run () from /lib64/libglib-2.0.so.0 #22 0x00000000004058ee in crmd_init () at main.c:154 #23 0x0000000000405c2c in main (argc=1, argv=0x7fffdc207528) at main.c:121 Not sure how to take this further. It has been difficult to characterize what exactly is or isn't happening, and hopefully I've not left out some critical detail. Thanks. /Lindsay
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org