On Tue, Dec 7, 2010 at 1:24 PM, <ant1spamz-pacema...@yahoo.com> wrote: > Hi there, > I have a requirement to make a single node cluster primarily for resource > monitoring on the local node so that network load balancing from my front > end load balancers works correctly and the node in question fails out due to > either my public or private interface or both interfaces fail (typical OR > Truth Table) > my NLB has the following setup > 2 front end LB's with a failover IP between them and direct routing to my > nodes public interface with monitoring on the private interface > my nodes > one public interface and one private interface, this is how things are and > I cant change it. > ================= > setup > =========== > pingd to my LB1 - 192.168.0.68 > pingd to my LB2 (represents a "public" ping destination) - 192.168.0.69 > location constraint to fail if either one of the ping times out > now on startup everything is ok, apache launches along with my 2 pingd, the > fail constraint works as well > ================ > the problem > ============= > now when I simulate a network failure (iptables -s web1.testcluster -j DROP) > apache is correctly failed. When pingd re-establishes connection the > Apache constraint must be reversed and Apache simply started. > how do I achieve the automatic resource restart?
it will happen whenever the pingd resource redetects network connectivity (up to 15s later) based on your monitor interval > Could my monitor constraint on the apache resource be in conflict with > pingd? shouldn't be, did you wait long enough for connectivity to return and the next monitor op to happen? > am I simply missing a "recovery" constraint to start the service? > is my location constraint not correctly done? > Other? > possibly this: Resource apache cannot run anywhere (what this means I have > no idea) > icmp is ok > Last updated: Tue Dec 7 07:09:11 2010 > Stack: Heartbeat > Current DC: web1.testcluster (ae391b6f-176d-43bc-93b4-8104ff3414c8) - > partition with quorum > Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3 > 1 Nodes configured, unknown expected votes > 3 Resources configured. > ============ > Online: [ web1.testcluster ] > pingdnet1 (ocf::pacemaker:pingd): Started web1.testcluster > pingdnet2 (ocf::pacemaker:pingd): Started web1.testcluster > crm(live)# Ctrl-C, leaving > [r...@web1 ~]# date > Tue Dec 7 07:13:11 EST 2010 > [r...@web1 ~]# ping 192.168.0.69 > PING 192.168.0.69 (192.168.0.69) 56(84) bytes of data. > 64 bytes from 192.168.0.69: icmp_seq=1 ttl=64 time=0.104 ms > --- 192.168.0.69 ping statistics --- > 1 packets transmitted, 1 received, 0% packet loss, time 0ms > rtt min/avg/max/mdev = 0.104/0.104/0.104/0.000 ms > [r...@web1 ~]# ping 192.168.0.68 > PING 192.168.0.68 (192.168.0.68) 56(84) bytes of data. > 64 bytes from 192.168.0.68: icmp_seq=1 ttl=64 time=0.151 ms > --- 192.168.0.68 ping statistics --- > 1 packets transmitted, 1 received, 0% packet loss, time 0ms > rtt min/avg/max/mdev = 0.151/0.151/0.151/0.000 ms > ================ > conf > ==== > primitive pingdnet1 ocf:pacemaker:pingd params host_list=192.168.0.69 > name=pingdnet1 op monitor interval=15s timeout=5s > primitive pingdnet2 ocf:pacemaker:pingd params host_list=192.168.0.68 > name=pingdnet2 op monitor interval=15s timeout=5s > primitive apache lsb::httpd op monitor interval=15s > location apache-ping-constraint apache rule -inf: not_defined pingdnet1 or > pingdnet1 lte 0 > location apache-ping-constraint2 apache rule -inf: not_defined pingdnet2 or > pingdnet2 lte 0 > order ping-then-apache inf: pingdnet1 pingdnet2 apache > =============================================== > logs to help > ====================== > Dec 7 06:39:41 web1 lrmd: [2471]: info: RA output: (apache:start:stdout) > Starting httpd: > Dec 7 06:39:41 web1 lrmd: [2471]: info: RA output: (apache:start:stdout) [ > Dec 7 06:39:41 web1 lrmd: [2471]: info: RA output: (apache:start:stdout) > OK > Dec 7 06:39:41 web1 lrmd: [2471]: info: RA output: (apache:start:stdout) ] > Dec 7 06:39:41 web1 lrmd: [2471]: info: RA output: (apache:start:stdout) > Dec 7 06:39:41 web1 lrmd: [2471]: info: RA output: (apache:start:stdout) > Dec 7 06:39:41 web1 lrmd: [2471]: info: Managed apache:start process 10650 > exited with return code 0. > Dec 7 06:39:41 web1 crmd: [2474]: info: process_lrm_event: LRM operation > apache_start_0 (call=25, rc=0, cib-update=196, confirmed=true) ok > Dec 7 06:39:41 web1 crmd: [2474]: info: match_graph_event: Action > apache_start_0 (14) confirmed on web1.testcluster (rc=0) > Dec 7 06:39:41 web1 crmd: [2474]: info: te_rsc_command: Initiating action > 15: monitor apache_monitor_15000 on web1.testcluster (local) > Dec 7 06:39:41 web1 crmd: [2474]: info: do_lrm_rsc_op: Performing > key=15:34:0:02fb0ab7-1384-4125-b14a-0ab5b4e9d1e8 op=apache_monitor_15000 ) > Dec 7 06:39:41 web1 lrmd: [2471]: info: rsc:apache:26: monitor > Dec 7 06:39:41 web1 crmd: [2474]: info: te_pseudo_action: Pseudo action 3 > fired and confirmed > Dec 7 06:39:41 web1 lrmd: [2471]: info: Managed apache:monitor process > 10666 exited with return code 0. > Dec 7 06:39:41 web1 crmd: [2474]: info: process_lrm_event: LRM operation > apache_monitor_15000 (call=26, rc=0, cib-update=197, confirmed=false) ok > Dec 7 06:39:41 web1 crmd: [2474]: info: match_graph_event: Action > apache_monitor_15000 (15) confirmed on web1.testcluster (rc=0) > > Dec 7 06:56:08 web1 pengine: [2487]: notice: native_print: pingdnet1 > (ocf::pacemaker:pingd): Started web1.testcluster > Dec 7 06:56:08 web1 pengine: [2487]: notice: native_print: pingdnet2 > (ocf::pacemaker:pingd): Started web1.testcluster > Dec 7 06:56:08 web1 pengine: [2487]: notice: native_print: apache > (lsb:httpd): Stopped > Dec 7 06:56:08 web1 pengine: [2487]: info: native_color: Resource apache > cannot run anywhere > Dec 7 06:56:08 web1 pengine: [2487]: notice: LogActions: Leave resource > pingdnet1 (Started web1.testcluster) > Dec 7 06:56:08 web1 pengine: [2487]: notice: LogActions: Leave resource > pingdnet2 (Started web1.testcluster) > Dec 7 06:56:08 web1 pengine: [2487]: notice: LogActions: Leave resource > apache (Stopped) > > > > Dec 7 07:10:15 web1 pingd: [9653]: info: ping_read: Retrying... > Dec 7 07:10:16 web1 pingd: [9521]: info: ping_read: Retrying... > Dec 7 07:10:47 web1 last message repeated 31 times > Dec 7 07:11:08 web1 last message repeated 21 times > Dec 7 07:11:08 web1 last message repeated 21 times > Dec 7 07:11:08 web1 crmd: [2474]: info: crm_timer_popped: PEngine Recheck > Timer (I_PE_CALC) just popped! > Dec 7 07:11:08 web1 crmd: [2474]: info: do_state_transition: State > transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED > origin=crm_timer_popped ] > Dec 7 07:11:08 web1 crmd: [2474]: info: do_state_transition: Progressed to > state S_POLICY_ENGINE after C_TIMER_POPPED > Dec 7 07:11:08 web1 crmd: [2474]: info: do_state_transition: All 1 cluster > nodes are eligible to run resources. > Dec 7 07:11:08 web1 crmd: [2474]: info: do_pe_invoke: Query 201: Requesting > the current CIB: S_POLICY_ENGINE > Dec 7 07:11:08 web1 crmd: [2474]: info: do_pe_invoke_callback: Invoking the > PE: query=201, ref=pe_calc-dc-1291723868-103, seq=1, quorate=1 > Dec 7 07:11:08 web1 pengine: [2487]: notice: unpack_config: On loss of CCM > Quorum: Ignore > Dec 7 07:11:08 web1 pengine: [2487]: info: unpack_config: Node scores: > 'red' = -INFINITY, 'yellow' = 0, 'green' = 0 > Dec 7 07:11:08 web1 pengine: [2487]: info: determine_online_status: Node > web1.testcluster is online > Dec 7 07:11:08 web1 pengine: [2487]: notice: native_print: pingdnet1 > (ocf::pacemaker:pingd): Started web1.testcluster > Dec 7 07:11:08 web1 pengine: [2487]: notice: native_print: pingdnet2 > (ocf::pacemaker:pingd): Started web1.testcluster > Dec 7 07:11:08 web1 pengine: [2487]: notice: native_print: apache > (lsb:httpd): Stopped > Dec 7 07:11:08 web1 pengine: [2487]: info: native_color: Resource apache > cannot run anywhere > Dec 7 07:11:08 web1 pengine: [2487]: notice: LogActions: Leave resource > pingdnet1 (Started web1.testcluster) > Dec 7 07:11:08 web1 pengine: [2487]: notice: LogActions: Leave resource > pingdnet2 (Started web1.testcluster) > Dec 7 07:11:08 web1 pengine: [2487]: notice: LogActions: Leave resource > apache (Stopped) > Dec 7 07:11:08 web1 crmd: [2474]: info: do_state_transition: State > transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS > cause=C_IPC_MESSAGE origin=handle_response ] > Dec 7 07:11:08 web1 crmd: [2474]: info: unpack_graph: Unpacked transition > 37: 0 actions in 0 synapses > Dec 7 07:11:08 web1 crmd: [2474]: info: do_te_invoke: Processing graph 37 > (ref=pe_calc-dc-1291723868-103) derived from > /var/lib/pengine/pe-input-555.bz2 > Dec 7 07:11:08 web1 crmd: [2474]: info: run_graph: > ==================================================== > Dec 7 07:11:08 web1 crmd: [2474]: notice: run_graph: Transition 37 > (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, > Source=/var/lib/pengine/pe-input-555.bz2): Complete > Dec 7 07:11:08 web1 crmd: [2474]: info: te_graph_trigger: Transition 37 is > now complete > Dec 7 07:11:08 web1 crmd: [2474]: info: notify_crmd: Transition 37 status: > done - <null> > Dec 7 07:11:08 web1 crmd: [2474]: info: do_state_transition: State > transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS > cause=C_FSA_INTERNAL origin=notify_crmd ] > Dec 7 07:11:08 web1 crmd: [2474]: info: do_state_transition: Starting > PEngine Recheck Timer > Dec 7 07:11:08 web1 pengine: [2487]: info: process_pe_message: Transition > 37: PEngine Input stored in: /var/lib/pengine/pe-input-555.bz2 > > > > > > > > > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > > _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker