Hi Andreas, Yes this is only for testing. The specific test was not two VM's running on same host. We have two physical servers each running a VM & the VM's run pacemaker/heartbeat. We reboot both physical servers (to simulate a power-fail) & after that watch both VM's do negotiation.
--Shyam On Thu, Feb 2, 2012 at 3:38 PM, Andreas Kurz <andr...@hastexo.com> wrote: > On 02/02/2012 04:45 AM, Shyam wrote: > > Hi Andreas, > > > > Thanks for your reply. > > > > We are using pacemaker in VM environment & was primarily checking how it > > behaves when two nodes hosting the clustered VM's reboot. It apparently > > took a very long time doing the elections. > > Ok, but this is only for testing? For a production system the VMs > running a cluster should not run on the same host as this would be a SPOF. > > > > > I realized that we were using dc-deadtime at 5sec. After bumping this up > > to 10sec, this long election cycle problem disappeared. > > ... interesting > > Regards, > Andreas > > -- > Need help with Pacemaker? > http://www.hastexo.com/now > > > > > --Shyam > > > > On Thu, Feb 2, 2012 at 3:59 AM, Andreas Kurz <andr...@hastexo.com > > <mailto:andr...@hastexo.com>> wrote: > > > > On 01/27/2012 12:21 PM, Shyam wrote: > > > Folks, > > > > > > We are constantly running into a long election cycle where in a > 2-node > > > cluster when both of them are simultaneously rebooted, they take a > > long > > > time running through election loop. > > > > why do you want to reboot them simultaneously? ... stop them one > after > > another and this will work fine. > > > > If you want to avoid time consuming resource movement use cluster > > property stop-all-resources prior to the serialized shutdown. > > > > Regards, > > Andreas > > > > -- > > Need help with Pacemaker? > > http://www.hastexo.com/now > > > > > > > > On one node pacemaker loops like: > > > Jan 26 22:03:20 vsa-0000009c-vc-1 crmd: [1134]: info: > do_dc_takeover: > > > Taking over DC status for this partition > > > Jan 26 22:03:20 vsa-0000009c-vc-1 cib: [1130]: info: > > > cib_process_readwrite: We are now in R/O mode > > > Jan 26 22:03:20 vsa-0000009c-vc-1 cib: [1130]: info: > > > cib_process_request: Operation complete: op cib_slave_all for > section > > > 'all' (origin=local/crmd/222, version=1.1.1): ok (rc=0) > > > Jan 26 22:03:20 vsa-0000009c-vc-1 cib: [1130]: info: > > > cib_process_readwrite: We are now in R/W mode > > > Jan 26 22:03:20 vsa-0000009c-vc-1 cib: [1130]: info: > > > cib_process_request: Operation complete: op cib_master for section > > 'all' > > > (origin=local/crmd/223, version=1.1.1): ok (rc=0) > > > Jan 26 22:03:20 vsa-0000009c-vc-1 cib: [1130]: info: > > > cib_process_request: Operation complete: op cib_modify for section > cib > > > (origin=local/crmd/224, version=1.1.1): ok (rc=0) > > > Jan 26 22:03:20 vsa-0000009c-vc-1 cib: [1130]: info: > > > cib_process_request: Operation complete: op cib_modify for section > > > crm_config (origin=local/crmd/226, version=1.1.1): ok (rc=0) > > > Jan 26 22:03:20 vsa-0000009c-vc-1 crmd: [1134]: info: > > > do_dc_join_offer_all: join-25: Waiting on 2 outstanding join acks > > > Jan 26 22:03:20 vsa-0000009c-vc-1 cib: [1130]: info: > > > cib_process_request: Operation complete: op cib_modify for section > > > crm_config (origin=local/crmd/228, version=1.1.1): ok (rc=0) > > > Jan 26 22:03:20 vsa-0000009c-vc-1 crmd: [1134]: info: > > > config_query_callback: Checking for expired actions every 900000ms > > > Jan 26 22:03:20 vsa-0000009c-vc-1 crmd: [1134]: info: > > > do_election_count_vote: Election 50 (owner: > > > 00000156-0156-0000-2b91-000000000000) pass: vote from > > vsa-0000009c-vc-0 > > > (Age) > > > Jan 26 22:03:20 vsa-0000009c-vc-1 crmd: [1134]: info: update_dc: > > Set DC > > > to vsa-0000009c-vc-1 (3.0.1) > > > Jan 26 22:03:20 vsa-0000009c-vc-1 crmd: [1134]: info: > > > do_state_transition: State transition S_INTEGRATION -> S_ELECTION [ > > > input=I_ELECTION cause=C_FSA_INTERNAL > origin=do_election_count_vote ] > > > Jan 26 22:03:20 vsa-0000009c-vc-1 crmd: [1134]: info: update_dc: > Unset > > > DC vsa-0000009c-vc-1 > > > Jan 26 22:03:21 vsa-0000009c-vc-1 crmd: [1134]: info: > > > do_election_count_vote: Election 51 (owner: > > > 00000156-0156-0000-2b91-000000000000) pass: vote from > > vsa-0000009c-vc-0 > > > (Age) > > > Jan 26 22:03:21 vsa-0000009c-vc-1 crmd: [1134]: WARN: do_log: FSA: > > Input > > > I_JOIN_REQUEST from route_message() received in state S_ELECTION > > > Jan 26 22:03:22 vsa-0000009c-vc-1 crmd: [1134]: info: > > > do_state_transition: State transition S_ELECTION -> S_INTEGRATION [ > > > input=I_ELECTION_DC cause=C_FSA_INTERNAL origin=do_election_check ] > > > Jan 26 22:03:22 vsa-0000009c-vc-1 crmd: [1134]: info: > start_subsystem: > > > Starting sub-system "pengine" > > > Jan 26 22:03:22 vsa-0000009c-vc-1 crmd: [1134]: WARN: > start_subsystem: > > > Client pengine already running as pid 1234 > > > Jan 26 22:03:26 vsa-0000009c-vc-1 crmd: [1134]: info: > do_dc_takeover: > > > Taking over DC status for this partition > > > Jan 26 22:03:26 vsa-0000009c-vc-1 cib: [1130]: info: > > > cib_process_readwrite: We are now in R/O mode > > > Jan 26 22:03:26 vsa-0000009c-vc-1 cib: [1130]: info: > > > cib_process_request: Operation complete: op cib_slave_all for > section > > > 'all' (origin=local/crmd/231, version=1.1.1): ok (rc=0) > > > Jan 26 22:03:26 vsa-0000009c-vc-1 cib: [1130]: info: > > > cib_process_readwrite: We are now in R/W mode > > > Jan 26 22:03:26 vsa-0000009c-vc-1 cib: [1130]: info: > > > cib_process_request: Operation complete: op cib_master for section > > 'all' > > > (origin=local/crmd/232, version=1.1.1): ok (rc=0) > > > Jan 26 22:03:26 vsa-0000009c-vc-1 cib: [1130]: info: > > > cib_process_request: Operation complete: op cib_modify for section > cib > > > (origin=local/crmd/233, version=1.1.1): ok (rc=0) > > > Jan 26 22:03:26 vsa-0000009c-vc-1 cib: [1130]: info: > > > cib_process_request: Operation complete: op cib_modify for section > > > crm_config (origin=local/crmd/235, version=1.1.1): ok (rc=0) > > > Jan 26 22:03:26 vsa-0000009c-vc-1 crmd: [1134]: info: > > > do_dc_join_offer_all: join-26: Waiting on 2 outstanding join acks > > > Jan 26 22:03:26 vsa-0000009c-vc-1 cib: [1130]: info: > > > cib_process_request: Operation complete: op cib_modify for section > > > crm_config (origin=local/crmd/237, version=1.1.1): ok (rc=0) > > > Jan 26 22:03:26 vsa-0000009c-vc-1 crmd: [1134]: info: > > > config_query_callback: Checking for expired actions every 900000ms > > > Jan 26 22:03:26 vsa-0000009c-vc-1 crmd: [1134]: info: > > > do_election_count_vote: Election 52 (owner: > > > 00000156-0156-0000-2b91-000000000000) pass: vote from > > vsa-0000009c-vc-0 > > > (Age) > > > Jan 26 22:03:26 vsa-0000009c-vc-1 crmd: [1134]: info: update_dc: > > Set DC > > > to vsa-0000009c-vc-1 (3.0.1) > > > Jan 26 22:03:26 vsa-0000009c-vc-1 crmd: [1134]: info: > > > do_state_transition: State transition S_INTEGRATION -> S_ELECTION [ > > > input=I_ELECTION cause=C_FSA_INTERNAL > origin=do_election_count_vote ] > > > Jan 26 22:03:26 vsa-0000009c-vc-1 crmd: [1134]: info: update_dc: > Unset > > > DC vsa-0000009c-vc-1 > > > Jan 26 22:03:27 vsa-0000009c-vc-1 crmd: [1134]: info: > > > do_election_count_vote: Election 53 (owner: > > > 00000156-0156-0000-2b91-000000000000) pass: vote from > > vsa-0000009c-vc-0 > > > (Age) > > > Jan 26 22:03:27 vsa-0000009c-vc-1 crmd: [1134]: WARN: do_log: FSA: > > Input > > > I_JOIN_REQUEST from route_message() received in state S_ELECTION > > > Jan 26 22:03:28 vsa-0000009c-vc-1 crmd: [1134]: info: > > > do_state_transition: State transition S_ELECTION -> S_INTEGRATION [ > > > input=I_ELECTION_DC cause=C_FSA_INTERNAL origin=do_election_check ] > > > Jan 26 22:03:28 vsa-0000009c-vc-1 crmd: [1134]: info: > start_subsystem: > > > Starting sub-system "pengine" > > > Jan 26 22:03:28 vsa-0000009c-vc-1 crmd: [1134]: WARN: > start_subsystem: > > > Client pengine already running as pid 1234 > > > > > > & other node with > > > Jan 26 22:03:20 vsa-0000009c-vc-0 crmd: [1314]: info: > > crm_timer_popped: > > > Election Trigger (I_DC_TIMEOUT) just popped! > > > Jan 26 22:03:20 vsa-0000009c-vc-0 crmd: [1314]: WARN: do_log: FSA: > > Input > > > I_DC_TIMEOUT from crm_timer_popped() received in state S_PENDING > > > Jan 26 22:03:20 vsa-0000009c-vc-0 crmd: [1314]: info: > > > do_state_transition: State transition S_PENDING -> S_ELECTION [ > > > input=I_DC_TIMEOUT cause=C_TIMER_POPPED origin=crm_timer_popped ] > > > Jan 26 22:03:21 vsa-0000009c-vc-0 crmd: [1314]: WARN: do_log: FSA: > > Input > > > I_JOIN_OFFER from route_message() received in state S_ELECTION > > > Jan 26 22:03:21 vsa-0000009c-vc-0 crmd: [1314]: info: > > > do_state_transition: State transition S_ELECTION -> S_PENDING [ > > > input=I_PENDING cause=C_FSA_INTERNAL origin=do_election_count_vote > ] > > > Jan 26 22:03:21 vsa-0000009c-vc-0 crmd: [1314]: info: > > do_dc_release: DC > > > role released > > > Jan 26 22:03:21 vsa-0000009c-vc-0 crmd: [1314]: info: > do_te_control: > > > Transitioner is now inactive > > > Jan 26 22:03:26 vsa-0000009c-vc-0 crmd: [1314]: info: > > crm_timer_popped: > > > Election Trigger (I_DC_TIMEOUT) just popped! > > > Jan 26 22:03:26 vsa-0000009c-vc-0 crmd: [1314]: WARN: do_log: FSA: > > Input > > > I_DC_TIMEOUT from crm_timer_popped() received in state S_PENDING > > > Jan 26 22:03:26 vsa-0000009c-vc-0 crmd: [1314]: info: > > > do_state_transition: State transition S_PENDING -> S_ELECTION [ > > > input=I_DC_TIMEOUT cause=C_TIMER_POPPED origin=crm_timer_popped ] > > > Jan 26 22:03:27 vsa-0000009c-vc-0 crmd: [1314]: WARN: do_log: FSA: > > Input > > > I_JOIN_OFFER from route_message() received in state S_ELECTION > > > Jan 26 22:03:27 vsa-0000009c-vc-0 crmd: [1314]: info: > > > do_state_transition: State transition S_ELECTION -> S_PENDING [ > > > input=I_PENDING cause=C_FSA_INTERNAL origin=do_election_count_vote > ] > > > Jan 26 22:03:27 vsa-0000009c-vc-0 crmd: [1314]: info: > > do_dc_release: DC > > > role released > > > Jan 26 22:03:27 vsa-0000009c-vc-0 crmd: [1314]: info: > do_te_control: > > > Transitioner is now inactive > > > > > > This takes several minutes & finally breaks. > > > > > > Any pointers on what can be causing this? > > > > > > Thanks. > > > > > > --Shyam > > > > > > > > > _______________________________________________ > > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > > <mailto:Pacemaker@oss.clusterlabs.org> > > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > > > > > Project Home: http://www.clusterlabs.org > > > Getting started: > > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > > Bugs: http://bugs.clusterlabs.org > > > > > > > > _______________________________________________ > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > > <mailto:Pacemaker@oss.clusterlabs.org> > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > > > Project Home: http://www.clusterlabs.org > > Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > Bugs: http://bugs.clusterlabs.org > > > > > > > > > > _______________________________________________ > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > > > Project Home: http://www.clusterlabs.org > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > Bugs: http://bugs.clusterlabs.org > > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > >
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org