On Sat, Dec 29, 2012 at 1:21 AM, Stefan Midjich <sweh...@gmail.com> wrote: > Every 15-18 minutes one of my resources gets stopped on one node and then is > restarted shortly after. > > In the DC log I can see the following error lines. > > Dec 28 15:04:09 app01 pengine: [8618]: debug: clone_rsc_colocation_rh: > Pairing resOCFS:1 with groupOcfs2Mgmt:0 > Dec 28 15:04:09 app01 pengine: [8618]: debug: native_assign_node: Assigning > app02 to resOCFS:1 > Dec 28 15:04:09 app01 pengine: [8618]: ERROR: color_instance: Pre-allocation > failed: got app02 instead of app01
Hmm, thats not good. Some time after the logs below should be a reference to a file ending with .bz2, can you send that to me please? > Dec 28 15:04:09 app01 pengine: [8618]: info: native_deallocate: Deallocating > resOCFS:1 from app02 > Dec 28 15:04:09 app01 pengine: [8618]: debug: clone_rsc_colocation_rh: > Pairing resOCFS:0 with groupOcfs2Mgmt:0 > Dec 28 15:04:09 app01 pengine: [8618]: debug: native_assign_node: Assigning > app02 to resOCFS:0 > Dec 28 15:04:09 app01 pengine: [8618]: debug: clone_rsc_colocation_rh: > Pairing resOCFS:1 with groupOcfs2Mgmt:1 > Dec 28 15:04:09 app01 pengine: [8618]: debug: clone_rsc_colocation_rh: > Pairing resOCFS:1 with groupOcfs2Mgmt:1 > Dec 28 15:04:09 app01 pengine: [8618]: debug: native_assign_node: All nodes > for resource resOCFS:1 are unavailable, unclean or shutting down (app01: 1, > -1000000) > Dec 28 15:04:09 app01 pengine: [8618]: debug: native_assign_node: Could not > allocate a node for resOCFS:1 > Dec 28 15:04:09 app01 pengine: [8618]: info: native_color: Resource > resOCFS:1 cannot run anywhere > > This plays out before every stop event of OCFS. > > Here is the cib. > > primitive VirtualIP0 ocf:heartbeat:IPaddr2 \ > params ip="10.121.12.30" \ > op monitor interval="10s" \ > meta target-role="Started" > primitive resDLM ocf:pacemaker:controld > primitive resDrbdShared0 ocf:linbit:drbd \ > params drbd_resource="shared0" \ > operations $id="resDrbd-operations" \ > op monitor interval="20" role="Master" timeout="20" notify="true" \ > op monitor interval="30" role="Slave" timeout="20" notify="true" > primitive resJboss lsb:jboss4 \ > op monitor interval="120s" timeout="150s" \ > op start interval="0" timeout="150s" \ > op stop interval="0" timeout="150s" > primitive resO2CB ocf:pacemaker:o2cb > primitive resOCFS ocf:heartbeat:Filesystem \ > params device="/dev/drbd/by-res/shared0" directory="/data" > fstype="ocfs2" \ > op monitor interval="120s" timeout="40" \ > op start interval="0" timeout="60" \ > op stop interval="0" timeout="60" > group groupOcfs2Mgmt resDLM resO2CB > ms msDrbdShared0 resDrbdShared0 \ > meta resource-stickines="100" notify="true" interleave="true" > master-max="2" target-role="Started" > clone cloneJboss resJboss \ > meta interleave="true" ordered="true" is-managed="false" > target-role="Started" > clone cloneOCFS resOCFS \ > meta interleave="true" ordered="true" target-role="Started" > is-managed="true" > clone cloneOcfs2Mgmt groupOcfs2Mgmt \ > meta interleave="true" target-role="Started" > location locVirtualIP0 VirtualIP0 9001: app01 > colocation colDRBD inf: cloneOcfs2Mgmt msDrbdShared0:Master > colocation colOcfs2 inf: cloneOCFS cloneOcfs2Mgmt > order ordDRBD inf: msDrbdShared0:promote cloneOcfs2Mgmt:start > order ordOcfs2 inf: cloneOcfs2Mgmt:start cloneOCFS:start > property $id="cib-bootstrap-options" \ > dc-version="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff" \ > cluster-infrastructure="openais" \ > expected-quorum-votes="2" \ > stonith-enabled="false" \ > no-quorum-policy="ignore" \ > last-lrm-refresh="1356702541" > rsc_defaults $id="rsc-options" \ > resource-stickiness="0" > op_defaults $id="op-options" \ > timeout="20s" > > I first suspected wrong network name resolution but /etc/hosts is correct > and no duplicate names. > > -- > Hälsningar / Greetings > > Stefan Midjich > [De omnibus dubitandum] > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org