I have 4 servers running 4 dummy resources for colocation purposes. 2 of the 4 resources score properly and 2 do not, causing me significant headache.
anchorOSS1 and anchorOSS2 are showing equal scores for both nodes (not what I want) and anchorOSS3 and anchorOSS4 are showing the proper scores for the nodes Here is the relevant part of my configuration: location locOSS1primary anchorOSS1 500: node1 location locOSS1secondary anchorOSS1 250: node3 location locOSS2primary anchorOSS2 500: node2 location locOSS2secondary anchorOSS2 250: node4 location locOSS3primary anchorOSS3 500: node3 location locOSS3secondary anchorOSS3 250: node1 location locOSS4primary anchorOSS4 500: node4 location locOSS4secondary anchorOSS4 250: node2 . . colocation colocOSS1OSS2 -inf: anchorOSS2 anchorOSS1 colocation colocOSS1OSS4 -inf: anchorOSS4 anchorOSS1 colocation colocOSS1group 300: ( resOST0000 resOST0004 resOST0008 ) anchorOSS1 colocation colocOSS2OSS3 -inf: anchorOSS3 anchorOSS2 colocation colocOSS2group 300: ( resOST0001 resOST0005 resOST0009 ) anchorOSS2 colocation colocOSS3OSS4 -inf: anchorOSS4 anchorOSS3 colocation colocOSS3group 300: ( resOST0002 resOST0006 resOST000a ) anchorOSS3 colocation colocOSS4group 300: ( resOST0003 resOST0007 resOST000b ) anchorOSS4 Here are the anchor results of ptest -Ls after first starting up corosync native_color: anchorOSS1 allocation score on node1: 750 native_color: anchorOSS1 allocation score on node3: 750 native_color: anchorOSS2 allocation score on node2: 750 native_color: anchorOSS2 allocation score on node4: 750 native_color: anchorOSS3 allocation score on node3: 500 native_color: anchorOSS3 allocation score on node1: 250 native_color: anchorOSS4 allocation score on node4: 500 native_color: anchorOSS4 allocation score on node2: 250 With these scores, anchorOSS3 migrates and unmigrates properly using "crm resource migrate anchorOSS3" and "crm resource unmigrate anchorOSS3", same with anchorOSS4. However, migrating anchorOSS1 and anchorOSS2 results in a proper migration but unmigrating doesn't provide the desired results (moving back to the original server) because of the "same score for each node" The configuration is "opt-in" with explicitly denying access unless granted. I am not sure where the 750 score is even coming from. It's almost like the primary and secondary score added together. Some pertinent information for my setup: I am running RedHat 5.6 Pacemaker: 1.0.11 Corosync 1.2.8 OpenAIS 1.1.4 I'm not seeing any errors in the logs but I do see the following: Aug 18 20:17:50 node1 pengine: [12483]: debug: unpack_rsc_op: anchorOSS1_monitor_0 on node1 returned 0 (ok) instead of the expected value: 7 (not running) Aug 18 20:17:50 node1 pengine: [12483]: notice: unpack_rsc_op: Operation anchorOSS1_monitor_0 found resource anchorOSS1 active on node1 Aug 18 20:17:50 node1 pengine: [12483]: debug: unpack_rsc_op: anchorOSS2_monitor_0 on node2 returned 0 (ok) instead of the expected value: 7 (not running) Aug 18 20:17:50 node1 pengine: [12483]: notice: unpack_rsc_op: Operation anchorOSS2_monitor_0 found resource anchorOSS2 active on node2 Aug 18 20:17:50 node1 pengine: [12483]: debug: unpack_rsc_op: anchorOSS4_monitor_0 on node4 returned 0 (ok) instead of the expected value: 7 (not running) Aug 18 20:17:50 node1 pengine: [12483]: notice: unpack_rsc_op: Operation anchorOSS4_monitor_0 found resource anchorOSS4 active on node4 Aug 18 20:17:50 node1 pengine: [12483]: debug: unpack_rsc_op: anchorOSS3_monitor_0 on node3 returned 0 (ok) instead of the expected value: 7 (not running) Aug 18 20:17:50 node1 pengine: [12483]: notice: unpack_rsc_op: Operation anchorOSS3_monitor_0 found resource anchorOSS3 active on node3 Aug 18 20:17:50 node1 pengine: [12483]: notice: native_print: anchorOSS1 (ocf::heartbeat:Dummy): Started node1 Aug 18 20:17:50 node1 pengine: [12483]: notice: native_print: anchorOSS2 (ocf::heartbeat:Dummy): Started node4 Aug 18 20:17:50 node1 pengine: [12483]: notice: native_print: anchorOSS3 (ocf::heartbeat:Dummy): Started node3 Aug 18 20:17:50 node1 pengine: [12483]: notice: native_print: anchorOSS4 (ocf::heartbeat:Dummy): Started node4 ... Aug 18 20:17:51 node1 pengine: [12483]: info: rsc_merge_weights: resMDTLVM: Rolling back scores from anchorOSS2 Aug 18 20:17:51 node1 pengine: [12483]: info: rsc_merge_weights: resMDTLVM: Rolling back scores from anchorOSS4 Aug 18 20:17:51 node1 pengine: [12483]: info: rsc_merge_weights: resMDTLVM: Rolling back scores from anchorOSS3 Aug 18 20:17:51 node1 pengine: [12483]: info: rsc_merge_weights: resMDTLVM: Rolling back scores from anchorOSS4 This shows that it see's anchorOSS2 on node2 but starts it on node4. I believe this is because of the score values mentioned above but I'm clueless as to why. I have tried to understand the colocation explained documentation but I can't quite wrap my head around it for my situation since nothing I have in colocation is a dependent resource. I appreciate any help in setting me straight. Bobbie Lind Systems Engineer *Solutions Made Simple, Inc (SMSi)* 703-296-3087 (Cell) bl...@sms-fed.com
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker