Sorry for the delayed response, but I was out last week. I've applied this patch to 1.1.10-rc5 and have been testing:
# crm_attribute --type status --node "db02" --name "service_postgresql" --update "true" # crm_attribute --type status --node "db02" --name "service_postgresql" scope=status name=service_postgresql value=true # crm resource stop vm-db02 # crm resource start vm-db02 ### Wait a bit # crm_attribute --type status --node "db02" --name "service_postgresql" scope=status name=service_postgresql value=(null) Error performing operation: No such device or address # crm_attribute --type status --node "db02" --name "service_postgresql" --update "true" # crm_attribute --type status --node "db02" --name "service_postgresql" scope=status name=service_postgresql value=true Good so far. But now look at this (every node was clean, and all services were running, before we started): # crm status Last updated: Tue Jul 2 16:15:14 2013 Last change: Tue Jul 2 16:15:12 2013 via crmd on cvmh02 Stack: cman Current DC: cvmh02 - partition with quorum Version: 1.1.10rc5-1.el6.ccni-2718638 9 Nodes configured, unknown expected votes 59 Resources configured. Node db02: UNCLEAN (offline) Online: [ cvmh01 cvmh02 cvmh03 cvmh04 db02:vm-db02 ldap01:vm-ldap01 ldap02:vm-ldap02 ] OFFLINE: [ swbuildsl6:vm-swbuildsl6 ] Full list of resources: fence-cvmh01 (stonith:fence_ipmilan): Started cvmh04 fence-cvmh02 (stonith:fence_ipmilan): Started cvmh04 fence-cvmh03 (stonith:fence_ipmilan): Started cvmh04 fence-cvmh04 (stonith:fence_ipmilan): Started cvmh01 Clone Set: c-fs-libvirt-VM-xcm [fs-libvirt-VM-xcm] Started: [ cvmh01 cvmh02 cvmh03 cvmh04 ] Stopped: [ db02 ldap01 ldap02 swbuildsl6 ] Clone Set: c-p-libvirtd [p-libvirtd] Started: [ cvmh01 cvmh02 cvmh03 cvmh04 ] Stopped: [ db02 ldap01 ldap02 swbuildsl6 ] Clone Set: c-fs-bind-libvirt-VM-cvmh [fs-bind-libvirt-VM-cvmh] Started: [ cvmh01 cvmh02 cvmh03 cvmh04 ] Stopped: [ db02 ldap01 ldap02 swbuildsl6 ] Clone Set: c-watch-ib0 [p-watch-ib0] Started: [ cvmh01 cvmh02 cvmh03 cvmh04 ] Stopped: [ db02 ldap01 ldap02 swbuildsl6 ] Clone Set: c-fs-gpfs [p-fs-gpfs] Started: [ cvmh01 cvmh02 cvmh03 cvmh04 ] Stopped: [ db02 ldap01 ldap02 swbuildsl6 ] vm-compute-test (ocf::ccni:xcatVirtualDomain): Started cvmh03 vm-swbuildsl6 (ocf::ccni:xcatVirtualDomain): Stopped vm-db02 (ocf::ccni:xcatVirtualDomain): Started cvmh02 vm-ldap01 (ocf::ccni:xcatVirtualDomain): Started cvmh03 vm-ldap02 (ocf::ccni:xcatVirtualDomain): Started cvmh04 DummyOnVM (ocf::pacemaker:Dummy): Started cvmh01 Not so good, and I'm not sure how to clean this up. I can't seem to stop vm-db02 any more, even after I've entered: # crm_node -R db02 --force # crm resource start vm-db02 ### Wait a bit # crm status Last updated: Tue Jul 2 16:32:38 2013 Last change: Tue Jul 2 16:27:28 2013 via cibadmin on cvmh01 Stack: cman Current DC: cvmh02 - partition with quorum Version: 1.1.10rc5-1.el6.ccni-2718638 8 Nodes configured, unknown expected votes 54 Resources configured. Online: [ cvmh01 cvmh02 cvmh03 cvmh04 ldap01:vm-ldap01 ldap02:vm-ldap02 swbuildsl6:vm-swbuildsl6 ] OFFLINE: [ db02:vm-db02 ] fence-cvmh01 (stonith:fence_ipmilan): Started cvmh03 fence-cvmh02 (stonith:fence_ipmilan): Started cvmh03 fence-cvmh03 (stonith:fence_ipmilan): Started cvmh04 fence-cvmh04 (stonith:fence_ipmilan): Started cvmh01 Clone Set: c-fs-libvirt-VM-xcm [fs-libvirt-VM-xcm] Started: [ cvmh01 cvmh02 cvmh03 cvmh04 ] Stopped: [ db02 ldap01 ldap02 swbuildsl6 ] Clone Set: c-p-libvirtd [p-libvirtd] Started: [ cvmh01 cvmh02 cvmh03 cvmh04 ] Stopped: [ db02 ldap01 ldap02 swbuildsl6 ] Clone Set: c-fs-bind-libvirt-VM-cvmh [fs-bind-libvirt-VM-cvmh] Started: [ cvmh01 cvmh02 cvmh03 cvmh04 ] Stopped: [ db02 ldap01 ldap02 swbuildsl6 ] Clone Set: c-watch-ib0 [p-watch-ib0] Started: [ cvmh01 cvmh02 cvmh03 cvmh04 ] Stopped: [ db02 ldap01 ldap02 swbuildsl6 ] Clone Set: c-fs-gpfs [p-fs-gpfs] Started: [ cvmh01 cvmh02 cvmh03 cvmh04 ] Stopped: [ db02 ldap01 ldap02 swbuildsl6 ] vm-compute-test (ocf::ccni:xcatVirtualDomain): Started cvmh02 vm-swbuildsl6 (ocf::ccni:xcatVirtualDomain): Started cvmh01 vm-ldap01 (ocf::ccni:xcatVirtualDomain): Started cvmh03 vm-ldap02 (ocf::ccni:xcatVirtualDomain): Started cvmh04 DummyOnVM (ocf::pacemaker:Dummy): Started cvmh01 My only recourse has been to reboot the cluster. So let's do that and try setting a location constraint on DummyOnVM, to force it on db02... Last updated: Tue Jul 2 16:43:46 2013 Last change: Tue Jul 2 16:27:28 2013 via cibadmin on cvmh01 Stack: cman Current DC: cvmh02 - partition with quorum Version: 1.1.10rc5-1.el6.ccni-2718638 8 Nodes configured, unknown expected votes 54 Resources configured. Online: [ cvmh01 cvmh02 cvmh03 cvmh04 db02:vm-db02 ldap01:vm-ldap01 ldap02:vm-ldap02 swbuildsl6:vm-swbuildsl6 ] fence-cvmh01 (stonith:fence_ipmilan): Started cvmh04 fence-cvmh02 (stonith:fence_ipmilan): Started cvmh03 fence-cvmh03 (stonith:fence_ipmilan): Started cvmh04 fence-cvmh04 (stonith:fence_ipmilan): Started cvmh01 Clone Set: c-fs-libvirt-VM-xcm [fs-libvirt-VM-xcm] Started: [ cvmh01 cvmh02 cvmh03 cvmh04 ] Stopped: [ db02 ldap01 ldap02 swbuildsl6 ] Clone Set: c-p-libvirtd [p-libvirtd] Started: [ cvmh01 cvmh02 cvmh03 cvmh04 ] Stopped: [ db02 ldap01 ldap02 swbuildsl6 ] Clone Set: c-fs-bind-libvirt-VM-cvmh [fs-bind-libvirt-VM-cvmh] Started: [ cvmh01 cvmh02 cvmh03 cvmh04 ] Stopped: [ db02 ldap01 ldap02 swbuildsl6 ] Clone Set: c-watch-ib0 [p-watch-ib0] Started: [ cvmh01 cvmh02 cvmh03 cvmh04 ] Stopped: [ db02 ldap01 ldap02 swbuildsl6 ] Clone Set: c-fs-gpfs [p-fs-gpfs] Started: [ cvmh01 cvmh02 cvmh03 cvmh04 ] Stopped: [ db02 ldap01 ldap02 swbuildsl6 ] vm-compute-test (ocf::ccni:xcatVirtualDomain): Started cvmh01 vm-swbuildsl6 (ocf::ccni:xcatVirtualDomain): Started cvmh01 vm-db02 (ocf::ccni:xcatVirtualDomain): Started cvmh02 vm-ldap01 (ocf::ccni:xcatVirtualDomain): Started cvmh03 vm-ldap02 (ocf::ccni:xcatVirtualDomain): Started cvmh04 DummyOnVM (ocf::pacemaker:Dummy): Started cvmh03 # pcs constraint location DummyOnVM prefers db02 # crm status ... Online: [ cvmh01 cvmh02 cvmh03 cvmh04 db02:vm-db02 ldap01:vm-ldap01 ldap02:vm-ldap02 swbuildsl6:vm-swbuildsl6 ] ... DummyOnVM (ocf::pacemaker:Dummy): Started db02 That's what we want to see. It would be interesting to stop db02. I expect DummyOnVM to stop. # crm resource stop vm-db02 # crm status ... Online: [ cvmh01 cvmh02 cvmh03 cvmh04 ldap01:vm-ldap01 ldap02:vm-ldap02 ] OFFLINE: [ db02:vm-db02 swbuildsl6:vm-swbuildsl6 ] ... DummyOnVM (ocf::pacemaker:Dummy): Started cvmh02 Failed actions: vm-compute-test_migrate_from_0 (node=cvmh02, call=147, rc=1, status=Timed Out, last-rc-change=Tue Jul 2 16:48:17 2013 , queued=20003ms, exec=0ms ): unknown error Well, that is odd. (It is the case that vm-swbuildsl6 has an order dependency on vm-compute-test, as I was trying to understand how migrations worked with order dependencies (not very well). Once vm-compute-test recovers, vm-swbuildsl6 does come back up.) This isn't really very good -- if I am running services in VM or other containers, I need them to run only in that container! If I start vm-db02 back up, I see that DummyOnVM is stopped and moved to db02. On Thu, Jun 20, 2013 at 4:16 PM, David Vossel <dvos...@redhat.com> wrote: > ----- Original Message ----- > > From: "David Vossel" <dvos...@redhat.com> > > To: "The Pacemaker cluster resource manager" < > pacemaker@oss.clusterlabs.org> > > Sent: Thursday, June 20, 2013 1:35:44 PM > > Subject: Re: [Pacemaker] Pacemaker remote nodes, naming, and attributes > > > > ----- Original Message ----- > > > From: "David Vossel" <dvos...@redhat.com> > > > To: "The Pacemaker cluster resource manager" > > > <pacemaker@oss.clusterlabs.org> > > > Sent: Wednesday, June 19, 2013 4:47:58 PM > > > Subject: Re: [Pacemaker] Pacemaker remote nodes, naming, and attributes > > > > > > ----- Original Message ----- > > > > From: "Lindsay Todd" <rltodd....@gmail.com> > > > > To: "The Pacemaker cluster resource manager" > > > > <Pacemaker@oss.clusterlabs.org> > > > > Sent: Wednesday, June 19, 2013 4:11:58 PM > > > > Subject: [Pacemaker] Pacemaker remote nodes, naming, and attributes > > > > > > > > I built a set of rpms for pacemaker 1.1.0-rc4 and updated my test > cluster > > > > (hopefully won't be a "test" cluster forever), as well as my VMs > running > > > > pacemaker-remote. The OS everywhere is Scientific Linux 6.4. I am > wanting > > > > to > > > > set some attributes on remote nodes, which I can use to control where > > > > services run. > > > > > > > > The first deviation I note from the documentation is the naming of > the > > > > remote > > > > nodes. I see: > > > > > > > > > > > > > > > > > > > > Last updated: Wed Jun 19 16:50:39 2013 > > > > Last change: Wed Jun 19 16:19:53 2013 via cibadmin on cvmh04 > > > > Stack: cman > > > > Current DC: cvmh02 - partition with quorum > > > > Version: 1.1.10rc4-1.el6.ccni-d19719c > > > > 8 Nodes configured, unknown expected votes > > > > 49 Resources configured. > > > > > > > > > > > > Online: [ cvmh01 cvmh02 cvmh03 cvmh04 db02:vm-db02 ldap01:vm-ldap01 > > > > ldap02:vm-ldap02 swbuildsl6:vm-swbuildsl6 ] > > > > > > > > Full list of resources: > > > > > > > > and so forth. The "remote-node" names are simply the hostname, so the > > > > vm-db02 > > > > VirtualDomain resource has a remote-node name of db02. The "Pacemaker > > > > Remote" manual suggests this should be displayed as "db02", not > > > > "db02:vm-db02", although I can see how the latter format would be > useful. > > > > > > Yep, this got changed since the documentation was published. We wanted > > > people to be able to recognize which remote-node went with which > resource > > > easily. > > > > > > > > > > > So now let's set an attribute on this remote node. What name do I > use? > > > > How > > > > about: > > > > > > > > > > > > > > > > > > > > # crm_attribute --node "db02:vm-db02" \ > > > > --name "service_postgresql" \ > > > > --update "true" > > > > Could not map name=db02:vm-db02 to a UUID > > > > Please choose from one of the matches above and suppy the 'id' with > > > > --attr-id > > > > > > > > Perhaps not the most informative output, but obviously it fails. > Let's > > > > try > > > > the unqualified name: > > > > > > > > > > > > > > > > > > > > # crm_attribute --node "db02" \ > > > > --name "service_postgresql" \ > > > > --update "true" > > > > Remote-nodes do not maintain permanent attributes, > > > > 'service_postgresql=true' > > > > will be removed after db02 reboots. > > > > Error setting service_postgresql=true (section=status, > set=status-db02): > > > > No > > > > such device or address > > > > Error performing operation: No such device or address > > > > I just tested this and ran into the same errors you did. Turns out this > > happens when the remote-node's status section is empty. If you start a > > resource on the node and then set the attribute it will work... obviously > > this is a bug. I'm working on a fix. > > This should help with the attributes bit. > > > https://github.com/ClusterLabs/pacemaker/commit/26d34a9171bddae67c56ebd8c2513ea8fa770204 > > -- Vossel > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org >
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org