ruslan usifov <ruslan.usifov@...> writes: > > > I solve this problem!On one node in log i found follow error message.slv009 .... peer is not p art of our clusterSo i stop pacemaker in that host (i use v1 for pacemaker):/etc/pacemaker stop > /etc/corosync stop Then remove all cib info from /var/lib/heatbeat/crm and cleanup /var/lib/pengine dir. thean restart clsuer on that node. And vuala all begin working as expected.But i still have question why this happens??? Why nodes begin think that other nodes are not the part of cluster??? > 2012/2/24 ruslan usifov <ruslan.usi...@gmail.com> > HelloI have 3 nodes cluster setup. After upgrade OS, i get that one node parmanently on OFFLINE state.OS: ubuntu 10.0.4pacemaker: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50con OFFLINE node i see in log follow:Feb 24 20:27:45 slv009 crmd: [9125]: info: do_dc_release: DC role releasedFeb 24 20:27:45 slv009 crmd: [9125]: info: do_te_control: Transitioner is now inactiveFeb 24 20:28:05 slv009 crmd: [9125]: info: crm_timer_popped: Election Trigger (I_DC_TIMEOUT) just popped (20000ms) > > Feb 24 20:28:05 slv009 crmd: [9125]: WARN: do_log: FSA: Input I_DC_TIMEOUT from crm_timer_popped() received in state S_PENDINGFeb 24 20:28:05 slv009 crmd: [9125]: info: do_state_transition: State transition S_PENDING -> S_ELECTION [ input=I_DC_TIMEOUT cause=C_TIMER_POPPED origin=crm_ > > timer_popped ]Feb 24 20:28:05 slv009 crmd: [9125]: info: do_state_transition: State transition S_ELECTION -> S_PENDING [ input=I_PENDING cause=C_FSA_INTERNAL origin=do_election_count_vote ]Feb 24 20:28:05 slv009 crmd: [9125]: info: do_dc_release: DC role released > > Feb 24 20:28:05 slv009 crmd: [9125]: info: do_te_control: Transitioner is now inactiveFeb 24 20:28:25 slv009 crmd: [9125]: info: crm_timer_popped: Election Trigger (I_DC_TIMEOUT) just popped (20000ms)Feb 24 20:28:25 slv009 crmd: [9125]: WARN: do_log: FSA: Input I_DC_TIMEOUT from crm_timer_popped() received in state S_PENDING > > Feb 24 20:28:25 slv009 crmd: [9125]: info: do_state_transition: State transition S_PENDING -> S_ELECTION [ input=I_DC_TIMEOUT cause=C_TIMER_POPPED origin=crm_timer_popped ]Feb 24 20:28:25 slv009 crmd: [9125]: info: do_state_transition: State transition S_ELECTION -> S_PENDING [ input=I_PENDING cause=C_FSA_INTERNAL origin=do_elec > > tion_count_vote ]Feb 24 20:28:25 slv009 crmd: [9125]: info: do_dc_release: DC role releasedFeb 24 20:28:25 slv009 crmd: [9125]: info: do_te_control: Transitioner is now inactiveFeb 24 20:28:45 slv009 crmd: [9125]: info: crm_timer_popped: Election Trigger (I_DC_TIMEOUT) just popped (20000ms) > > Feb 24 20:28:45 slv009 crmd: [9125]: WARN: do_log: FSA: Input I_DC_TIMEOUT from crm_timer_popped() received in state S_PENDINGFeb 24 20:28:45 slv009 crmd: [9125]: info: do_state_transition: State transition S_PENDING -> S_ELECTION [ input=I_DC_TIMEOUT cause=C_TIMER_POPPED origin=crm_ > > timer_popped ]Feb 24 20:28:45 slv009 crmd: [9125]: info: do_state_transition: State transition S_ELECTION -> S_PENDING [ input=I_PENDING cause=C_FSA_INTERNAL origin=do_election_count_vote ]Feb 24 20:28:45 slv009 crmd: [9125]: info: do_dc_release: DC role released > > Feb 24 20:28:45 slv009 crmd: [9125]: info: do_te_control: Transitioner is now inactiveFeb 24 20:29:05 slv009 crmd: [9125]: info: crm_timer_popped: Election Trigger (I_DC_TIMEOUT) just popped (20000ms)Feb 24 20:29:05 slv009 crmd: [9125]: WARN: do_log: FSA: Input I_DC_TIMEOUT from crm_timer_popped() received in state S_PENDING > > Feb 24 20:29:05 slv009 crmd: [9125]: info: do_state_transition: State transition S_PENDING -> S_ELECTION [ input=I_DC_TIMEOUT cause=C_TIMER_POPPED origin=crm_timer_popped ]Feb 24 20:29:05 slv009 crmd: [9125]: info: do_state_transition: State transition S_ELECTION -> S_PENDING [ input=I_PENDING cause=C_FSA_INTERNAL origin=do_elec > > tion_count_vote ]Feb 24 20:29:05 slv009 crmd: [9125]: info: do_dc_release: DC role releasedFeb 24 20:29:05 slv009 crmd: [9125]: info: do_te_control: Transitioner is now inactiveFeb 24 20:29:25 slv009 crmd: [9125]: info: crm_timer_popped: Election Trigger (I_DC_TIMEOUT) just popped (20000ms) > > Feb 24 20:29:25 slv009 crmd: [9125]: WARN: do_log: FSA: Input I_DC_TIMEOUT from crm_timer_popped() received in state S_PENDINGFeb 24 20:29:25 slv009 crmd: [9125]: info: do_state_transition: State transition S_PENDING -> S_ELECTION [ input=I_DC_TIMEOUT cause=C_TIMER_POPPED origin=crm_ > > timer_popped ]Feb 24 20:29:25 slv009 crmd: [9125]: info: do_state_transition: State transition S_ELECTION -> S_PENDING [ input=I_PENDING cause=C_FSA_INTERNAL origin=do_election_count_vote ]Feb 24 20:29:25 slv009 crmd: [9125]: info: do_dc_release: DC role released > > Feb 24 20:29:25 slv009 crmd: [9125]: info: do_te_control: Transitioner is now inactiveFeb 24 20:29:45 slv009 crmd: [9125]: info: crm_timer_popped: Election Trigger (I_DC_TIMEOUT) just popped (20000ms)Feb 24 20:29:45 slv009 crmd: [9125]: WARN: do_log: FSA: Input I_DC_TIMEOUT from crm_timer_popped() received in state S_PENDING > > Feb 24 20:29:45 slv009 crmd: [9125]: info: do_state_transition: State transition S_PENDING -> S_ELECTION [ input=I_DC_TIMEOUT cause=C_TIMER_POPPED origin=crm_timer_popped ]Feb 24 20:29:45 slv009 crmd: [9125]: info: do_state_transition: State transition S_ELECTION -> S_PENDING [ input=I_PENDING cause=C_FSA_INTERNAL origin=do_elec > > tion_count_vote ]Feb 24 20:29:45 slv009 crmd: [9125]: info: do_dc_release: DC role releasedFeb 24 20:29:45 slv009 crmd: [9125]: info: do_te_control: Transitioner is now inactiveFeb 24 20:30:05 slv009 crmd: [9125]: info: crm_timer_popped: Election Trigger (I_DC_TIMEOUT) just popped (20000ms) > > Feb 24 20:30:05 slv009 crmd: [9125]: WARN: do_log: FSA: Input I_DC_TIMEOUT from crm_timer_popped() received in state S_PENDINGFeb 24 20:30:05 slv009 crmd: [9125]: info: do_state_transition: State transition S_PENDING -> S_ELECTION [ input=I_DC_TIMEOUT cause=C_TIMER_POPPED origin=crm_ > > timer_popped ]Feb 24 20:30:05 slv009 crmd: [9125]: info: do_state_transition: State transition S_ELECTION -> S_PENDING [ input=I_PENDING cause=C_FSA_INTERNAL origin=do_election_count_vote ]I have follow crm conf:node slv008node slv009node slv010primitive http_173.192.214.78_eth1 ocf:heartbeat:IPaddr2 \ params ip="173.192.214.78" nic="eth1:1" cidr_netmask="30" \ op monitor interval="10s" > > primitive http_nginx ocf:heartbeat:nginx \ op monitor interval="10s" timeout="120s"group http http_173.192.214.78_eth1 http_nginx \ meta target-role="Started" is-managed="true" > > property $id="cib-bootstrap-options" \ dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \ cluster-infrastructure="openais" \ expected-quorum-votes="3" \ > > stonith-enabled="false"rsc_defaults $id="rsc-options" \ resource-stickiness="100"Also i cant restart pacemaker on that node cleanly ie throw init.d script (it just hung and all) > > > > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@... > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org >
Thanks a lot, I was having this same issue after months of running a healthy cluster, two of them became aware of only each other and forgot the rest. This trick brought them all together again. _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org