Hi Config pacemaker on centos 6.5 pacemaker-cli-1.1.10-14.el6_5.3.x86_64 pacemaker-1.1.10-14.el6_5.3.x86_64 pacemaker-libs-1.1.10-14.el6_5.3.x86_64 pacemaker-cluster-libs-1.1.10-14.el6_5.3.x86_64
this is my config Cluster Name: ybrp Corosync Nodes: Pacemaker Nodes: devrp1 devrp2 Resources: Resource: ybrpip (class=ocf provider=heartbeat type=IPaddr2) Attributes: ip=10.172.214.50 cidr_netmask=24 nic=eth0 clusterip_hash=sourceip-sourceport Meta Attrs: stickiness=0,migration-threshold=3,failure-timeout=600s Operations: monitor on-fail=restart interval=5s timeout=20s (ybrpip-monitor-interval-5s) Clone: ybrpstat-clone Meta Attrs: globally-unique=false clone-max=2 clone-node-max=1 Resource: ybrpstat (class=ocf provider=yb type=proxy) Operations: monitor on-fail=restart interval=5s timeout=20s (ybrpstat-monitor-interval-5s) Stonith Devices: Fencing Levels: Location Constraints: Ordering Constraints: start ybrpstat-clone then start ybrpip (Mandatory) (id:order-ybrpstat-clone-ybrpip-mandatory) Colocation Constraints: ybrpip with ybrpstat-clone (INFINITY) (id:colocation-ybrpip-ybrpstat-clone-INFINITY) Cluster Properties: cluster-infrastructure: cman dc-version: 1.1.10-14.el6_5.3-368c726 last-lrm-refresh: 1404892739 no-quorum-policy: ignore stonith-enabled: false I have my own resource file and I start stop the proxy service outside of pacemaker! I had an interesting problem, where I did a vmware update on the linux box, which interrupted network activity. Part of my monitor function on my script is to 1) test if the proxy process is running, 2) get a status page from the proxy and confirm it is 200 This is what I got in /var/log/messages Jul 9 06:16:13 devrp1 crmd[6849]: warning: update_failcount: Updating failcount for ybrpstat on devrp2 after failed monitor: rc=7 (update =value++, time=1404850573) Jul 9 06:16:13 devrp1 crmd[6849]: notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_ INTERNAL origin=abort_transition_graph ] Jul 9 06:16:13 devrp1 pengine[6848]: notice: unpack_config: On loss of CCM Quorum: Ignore Jul 9 06:16:13 devrp1 pengine[6848]: warning: unpack_rsc_op: Processing failed op monitor for ybrpstat:0 on devrp2: not running (7) Jul 9 06:16:13 devrp1 pengine[6848]: warning: unpack_rsc_op: Processing failed op start for ybrpstat:1 on devrp1: unknown error (1) Jul 9 06:16:13 devrp1 pengine[6848]: warning: common_apply_stickiness: Forcing ybrpstat-clone away from devrp1 after 1000000 failures (ma x=1000000) Jul 9 06:16:13 devrp1 pengine[6848]: warning: common_apply_stickiness: Forcing ybrpstat-clone away from devrp1 after 1000000 failures (ma x=1000000) Jul 9 06:16:13 devrp1 pengine[6848]: notice: LogActions: Restart ybrpip#011(Started devrp2) Jul 9 06:16:13 devrp1 pengine[6848]: notice: LogActions: Recover ybrpstat:0#011(Started devrp2) Jul 9 06:16:13 devrp1 pengine[6848]: notice: process_pe_message: Calculated Transition 1054: /var/lib/pacemaker/pengine/pe-input-235.bz2 Jul 9 06:16:13 devrp1 pengine[6848]: notice: unpack_config: On loss of CCM Quorum: Ignore Jul 9 06:16:13 devrp1 pengine[6848]: warning: unpack_rsc_op: Processing failed op monitor for ybrpstat:0 on devrp2: not running (7) Jul 9 06:16:13 devrp1 pengine[6848]: warning: unpack_rsc_op: Processing failed op start for ybrpstat:1 on devrp1: unknown error (1) Jul 9 06:16:13 devrp1 pengine[6848]: warning: common_apply_stickiness: Forcing ybrpstat-clone away from devrp1 after 1000000 failures (max=1000000) Jul 9 06:16:13 devrp1 pengine[6848]: warning: common_apply_stickiness: Forcing ybrpstat-clone away from devrp1 after 1000000 failures (max=1000000) Jul 9 06:16:13 devrp1 pengine[6848]: notice: LogActions: Restart ybrpip#011(Started devrp2) Jul 9 06:16:13 devrp1 pengine[6848]: notice: LogActions: Recover ybrpstat:0#011(Started devrp2) Jul 9 06:16:13 devrp1 pengine[6848]: notice: process_pe_message: Calculated Transition 1055: /var/lib/pacemaker/pengine/pe-input-236.bz2 Jul 9 06:16:13 devrp1 pengine[6848]: notice: unpack_config: On loss of CCM Quorum: Ignore Jul 9 06:16:13 devrp1 pengine[6848]: warning: unpack_rsc_op: Processing failed op monitor for ybrpstat:0 on devrp2: not running (7) Jul 9 06:16:13 devrp1 pengine[6848]: warning: unpack_rsc_op: Processing failed op start for ybrpstat:1 on devrp1: unknown error (1) Jul 9 06:16:13 devrp1 pengine[6848]: warning: common_apply_stickiness: Forcing ybrpstat-clone away from devrp1 after 1000000 failures (max=1000000) Jul 9 06:16:13 devrp1 pengine[6848]: warning: common_apply_stickiness: Forcing ybrpstat-clone away from devrp1 after 1000000 failures (max=1000000) Jul 9 06:16:13 devrp1 pengine[6848]: notice: LogActions: Restart ybrpip#011(Started devrp2) Jul 9 06:16:13 devrp1 pengine[6848]: notice: LogActions: Recover ybrpstat:0#011(Started devrp2) And it stay this way for the next 12 hours, until I got on. I poked around and to fix it I ran this /usr/sbin/pcs resource cleanup ybrpip /usr/sbin/pcs resource cleanup ybrpstat Bascially I cleaned up the errors and off it went all by itself. So my question is how do I configure it or what do I need to change in the resource script file to send a temp error back to pacemaker so that it should have kept trying to check the status of proxy ? It seems to me it tried once and then failed... although the log says filed after 1000000 failures .... how can I change that to infinite and where is the interval setting for this, cause in the config above it looks to me like it should be infinite ? Thanks Alex _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org