Hi

Config pacemaker on centos 6.5
pacemaker-cli-1.1.10-14.el6_5.3.x86_64
pacemaker-1.1.10-14.el6_5.3.x86_64
pacemaker-libs-1.1.10-14.el6_5.3.x86_64
pacemaker-cluster-libs-1.1.10-14.el6_5.3.x86_64

this is my config
Cluster Name: ybrp
Corosync Nodes:
 
Pacemaker Nodes:
 devrp1 devrp2 

Resources: 
 Resource: ybrpip (class=ocf provider=heartbeat type=IPaddr2)
  Attributes: ip=10.172.214.50 cidr_netmask=24 nic=eth0 
clusterip_hash=sourceip-sourceport 
  Meta Attrs: stickiness=0,migration-threshold=3,failure-timeout=600s 
  Operations: monitor on-fail=restart interval=5s timeout=20s 
(ybrpip-monitor-interval-5s)
 Clone: ybrpstat-clone
  Meta Attrs: globally-unique=false clone-max=2 clone-node-max=1 
  Resource: ybrpstat (class=ocf provider=yb type=proxy)
   Operations: monitor on-fail=restart interval=5s timeout=20s 
(ybrpstat-monitor-interval-5s)

Stonith Devices: 
Fencing Levels: 

Location Constraints:
Ordering Constraints:
  start ybrpstat-clone then start ybrpip (Mandatory) 
(id:order-ybrpstat-clone-ybrpip-mandatory)
Colocation Constraints:
  ybrpip with ybrpstat-clone (INFINITY) 
(id:colocation-ybrpip-ybrpstat-clone-INFINITY)

Cluster Properties:
 cluster-infrastructure: cman
 dc-version: 1.1.10-14.el6_5.3-368c726
 last-lrm-refresh: 1404892739
 no-quorum-policy: ignore
 stonith-enabled: false


I have my own resource file and I start stop the proxy service outside of 
pacemaker!

I had an interesting problem, where I did a vmware update on the linux box, 
which interrupted network activity.

Part of my monitor function on my script is to 1) test if the proxy process is 
running, 2) get a status page from the proxy and confirm it is 200


This is what I got in /var/log/messages

Jul  9 06:16:13 devrp1 crmd[6849]:  warning: update_failcount: Updating 
failcount for ybrpstat on devrp2 after failed monitor: rc=7 (update
=value++, time=1404850573)
Jul  9 06:16:13 devrp1 crmd[6849]:   notice: do_state_transition: State 
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_
INTERNAL origin=abort_transition_graph ]
Jul  9 06:16:13 devrp1 pengine[6848]:   notice: unpack_config: On loss of CCM 
Quorum: Ignore
Jul  9 06:16:13 devrp1 pengine[6848]:  warning: unpack_rsc_op: Processing 
failed op monitor for ybrpstat:0 on devrp2: not running (7)
Jul  9 06:16:13 devrp1 pengine[6848]:  warning: unpack_rsc_op: Processing 
failed op start for ybrpstat:1 on devrp1: unknown error (1)
Jul  9 06:16:13 devrp1 pengine[6848]:  warning: common_apply_stickiness: 
Forcing ybrpstat-clone away from devrp1 after 1000000 failures (ma
x=1000000)
Jul  9 06:16:13 devrp1 pengine[6848]:  warning: common_apply_stickiness: 
Forcing ybrpstat-clone away from devrp1 after 1000000 failures (ma
x=1000000)
Jul  9 06:16:13 devrp1 pengine[6848]:   notice: LogActions: Restart 
ybrpip#011(Started devrp2)
Jul  9 06:16:13 devrp1 pengine[6848]:   notice: LogActions: Recover 
ybrpstat:0#011(Started devrp2)
Jul  9 06:16:13 devrp1 pengine[6848]:   notice: process_pe_message: Calculated 
Transition 1054: /var/lib/pacemaker/pengine/pe-input-235.bz2
Jul  9 06:16:13 devrp1 pengine[6848]:   notice: unpack_config: On loss of CCM 
Quorum: Ignore
Jul  9 06:16:13 devrp1 pengine[6848]:  warning: unpack_rsc_op: Processing 
failed op monitor for ybrpstat:0 on devrp2: not running (7)
Jul  9 06:16:13 devrp1 pengine[6848]:  warning: unpack_rsc_op: Processing 
failed op start for ybrpstat:1 on devrp1: unknown error (1)
Jul  9 06:16:13 devrp1 pengine[6848]:  warning: common_apply_stickiness: 
Forcing ybrpstat-clone away from devrp1 after 1000000 failures (max=1000000)
Jul  9 06:16:13 devrp1 pengine[6848]:  warning: common_apply_stickiness: 
Forcing ybrpstat-clone away from devrp1 after 1000000 failures (max=1000000)
Jul  9 06:16:13 devrp1 pengine[6848]:   notice: LogActions: Restart 
ybrpip#011(Started devrp2)
Jul  9 06:16:13 devrp1 pengine[6848]:   notice: LogActions: Recover 
ybrpstat:0#011(Started devrp2)
Jul  9 06:16:13 devrp1 pengine[6848]:   notice: process_pe_message: Calculated 
Transition 1055: /var/lib/pacemaker/pengine/pe-input-236.bz2
Jul  9 06:16:13 devrp1 pengine[6848]:   notice: unpack_config: On loss of CCM 
Quorum: Ignore
Jul  9 06:16:13 devrp1 pengine[6848]:  warning: unpack_rsc_op: Processing 
failed op monitor for ybrpstat:0 on devrp2: not running (7)
Jul  9 06:16:13 devrp1 pengine[6848]:  warning: unpack_rsc_op: Processing 
failed op start for ybrpstat:1 on devrp1: unknown error (1)
Jul  9 06:16:13 devrp1 pengine[6848]:  warning: common_apply_stickiness: 
Forcing ybrpstat-clone away from devrp1 after 1000000 failures (max=1000000)
Jul  9 06:16:13 devrp1 pengine[6848]:  warning: common_apply_stickiness: 
Forcing ybrpstat-clone away from devrp1 after 1000000 failures (max=1000000)
Jul  9 06:16:13 devrp1 pengine[6848]:   notice: LogActions: Restart 
ybrpip#011(Started devrp2)
Jul  9 06:16:13 devrp1 pengine[6848]:   notice: LogActions: Recover 
ybrpstat:0#011(Started devrp2)


And it stay this way for the next 12 hours, until I got on.

I poked around and to fix it I ran this
        /usr/sbin/pcs resource cleanup ybrpip
        /usr/sbin/pcs resource cleanup ybrpstat

Bascially I cleaned up the errors and off it went all by itself.

So my question is how do I configure it or what do I need to change in the 
resource script file to send a temp error back to pacemaker so that it should 
have kept trying to check the status of proxy ?

It seems to me it tried once and then failed... although the log says filed 
after 1000000 failures ....  how can I change that to infinite and where is the 
interval setting for this, cause in the config above it looks to me like it 
should be infinite ?


Thanks
Alex


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to