sir, I have set up a two node cluster in Ubuntu 9.1. I have added a cluster-ip using ocf:heartbeat:IPaddr2, clonned lsb script "postgresql-8.4" and also added a manually created script for slony database replication.
Now every thing works fine but I am not able to use the ocf resource scripts. I mean fail over is not taking place or else even resource is not even taking. My ha.cf file and cib configuration is attached with this mail My ha.cf file autojoin none keepalive 2 deadtime 15 warntime 5 initdead 64 udpport 694 bcast eth0 auto_failback off node node1 node node2 crm respawn use_logd yes My cib.xml configuration file in cli format: node $id="3952b93e-786c-47d4-8c2f-a882e3d3d105" node2 \ attributes standby="off" node $id="ac87f697-5b44-4720-a8af-12a6f2295930" node1 \ attributes standby="off" primitive pgsql lsb:postgresql-8.4 \ meta target-role="Started" resource-stickness="inherited" \ op monitor interval="15s" timeout="25s" on-fail="standby" primitive slony-fail lsb:slony_failover \ meta target-role="Started" primitive vir-ip ocf:heartbeat:IPaddr2 \ params ip="192.168.10.10" nic="eth0" cidr_netmask="24" broadcast="192.168.10.255" \ op monitor interval="15s" timeout="25s" on-fail="standby" \ meta target-role="Started" clone pgclone pgsql \ meta notify="true" globally-unique="false" interleave="true" target-role="Started" colocation ip-with-slony inf: slony-fail vir-ip order slony-b4-ip inf: vir-ip slony-fail property $id="cib-bootstrap-options" \ dc-version="1.0.5-3840e6b5a305ccb803d29b468556739e75532d56" \ cluster-infrastructure="Heartbeat" \ no-quorum-policy="ignore" \ stonith-enabled="false" \ last-lrm-refresh="1266488780" rsc_defaults $id="rsc-options" \ resource-stickiness="INFINITY" I am assigning the cluster-ip (192.168.10.10) in eth0 with ip 192.168.10.129 in one machine and 192.168.10.130 in another machine. When I pull out the eth0 interface cable fail-over is not taking place. This is the log message i am getting while I pull out the cable: "Feb 18 16:55:58 node2 NetworkManager: <info> (eth0): carrier now OFF (device state 1)" and after a miniute or two log snippet: ------------------------------------------------------------------- Feb 18 16:57:37 node2 cib: [21940]: info: cib_stats: Processed 3 operations (13333.00us average, 0% utilization) in the last 10min Feb 18 17:02:53 node2 crmd: [21944]: info: crm_timer_popped: PEngine Recheck Timer (I_PE_CALC) just popped! Feb 18 17:02:53 node2 crmd: [21944]: info: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED origin=crm_timer_popped ] Feb 18 17:02:53 node2 crmd: [21944]: WARN: do_state_transition: Progressed to state S_POLICY_ENGINE after C_TIMER_POPPED Feb 18 17:02:53 node2 crmd: [21944]: info: do_state_transition: All 2 cluster nodes are eligible to run resources. Feb 18 17:02:53 node2 crmd: [21944]: info: do_pe_invoke: Query 111: Requesting the current CIB: S_POLICY_ENGINE Feb 18 17:02:53 node2 crmd: [21944]: info: do_pe_invoke_callback: Invoking the PE: ref=pe_calc-dc-1266492773-121, seq=2, quorate=1 Feb 18 17:02:53 node2 pengine: [21982]: notice: unpack_config: On loss of CCM Quorum: Ignore Feb 18 17:02:53 node2 pengine: [21982]: info: unpack_config: Node scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0 Feb 18 17:02:53 node2 pengine: [21982]: info: determine_online_status: Node node2 is online Feb 18 17:02:53 node2 pengine: [21982]: info: unpack_rsc_op: slony-fail_monitor_0 on node2 returned 0 (ok) instead of the expected value: 7 (not running) Feb 18 17:02:53 node2 pengine: [21982]: notice: unpack_rsc_op: Operation slony-fail_monitor_0 found resource slony-fail active on node2 Feb 18 17:02:53 node2 pengine: [21982]: info: unpack_rsc_op: pgsql:0_monitor_0 on node2 returned 0 (ok) instead of the expected value: 7 (not running) Feb 18 17:02:53 node2 pengine: [21982]: notice: unpack_rsc_op: Operation pgsql:0_monitor_0 found resource pgsql:0 active on node2 Feb 18 17:02:53 node2 pengine: [21982]: info: determine_online_status: Node node1 is online Feb 18 17:02:53 node2 pengine: [21982]: notice: native_print: vir-ip#011(ocf::heartbeat:IPaddr2):#011Started node2 Feb 18 17:02:53 node2 pengine: [21982]: notice: native_print: slony-fail#011(lsb:slony_failover):#011Started node2 Feb 18 17:02:53 node2 pengine: [21982]: notice: clone_print: Clone Set: pgclone Feb 18 17:02:53 node2 pengine: [21982]: notice: print_list: #011Started: [ node2 node1 ] Feb 18 17:02:53 node2 pengine: [21982]: notice: RecurringOp: Start recurring monitor (15s) for pgsql:1 on node1 Feb 18 17:02:53 node2 pengine: [21982]: notice: LogActions: Leave resource vir-ip#011(Started node2) Feb 18 17:02:53 node2 pengine: [21982]: notice: LogActions: Leave resource slony-fail#011(Started node2) Feb 18 17:02:53 node2 pengine: [21982]: notice: LogActions: Leave resource pgsql:0#011(Started node2) Feb 18 17:02:53 node2 pengine: [21982]: notice: LogActions: Leave resource pgsql:1#011(Started node1) Feb 18 17:02:53 node2 crmd: [21944]: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ] Feb 18 17:02:53 node2 crmd: [21944]: info: unpack_graph: Unpacked transition 26: 1 actions in 1 synapses Feb 18 17:02:53 node2 crmd: [21944]: info: do_te_invoke: Processing graph 26 (ref=pe_calc-dc-1266492773-121) derived from /var/lib/pengine/pe-input-125.bz2 Feb 18 17:02:53 node2 crmd: [21944]: info: te_rsc_command: Initiating action 15: monitor pgsql:1_monitor_15000 on node1 Feb 18 17:02:53 node2 pengine: [21982]: ERROR: write_last_sequence: Cannout open series file /var/lib/pengine/pe-input.last for writing Feb 18 17:02:53 node2 pengine: [21982]: info: process_pe_message: Transition 26: PEngine Input stored in: /var/lib/pengine/pe-input-125.bz2 Feb 18 17:02:55 node2 crmd: [21944]: info: match_graph_event: Action pgsql:1_monitor_15000 (15) confirmed on node1 (rc=0) Feb 18 17:02:55 node2 crmd: [21944]: info: run_graph: ==================================================== Feb 18 17:02:55 node2 crmd: [21944]: notice: run_graph: Transition 26 (Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pengine/pe-input-125.bz2): Complete Feb 18 17:02:55 node2 crmd: [21944]: info: te_graph_trigger: Transition 26 is now complete Feb 18 17:02:55 node2 crmd: [21944]: info: notify_crmd: Transition 26 status: done - <null> Feb 18 17:02:55 node2 crmd: [21944]: info: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ] Feb 18 17:02:55 node2 crmd: [21944]: info: do_state_transition: Starting PEngine Recheck Timer ------------------------------------------------------------------------------ Also I am not able to use the pgsql ocf script and hence I am using the init script and cloned it as I need to run it on both nodes for slony data base replication. I am using the heartbeat and pacemaker debs from the updated ubuntu karmic repo. (Heartbeat 2.99) Please check my configuration and tell me where I am missing....[?][?][?] -- Regards, Jayakrishnan. L Visit: www.jayakrishnan.bravehost.com
<<33A.gif>>
<<33C.gif>>
_______________________________________________ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker