Hello, On 01/19/2012 12:21 PM, Philippe Carbonnier wrote: > Hello Andreas, and others, > > I'm sorry not having well explain my problem (I'm french and don't speak > english fluently...).
np .. I'm also not a native speaker ;-) > The loop has arrived after putting ujoss1 (not ujboss2) standy, this is > why I put more log for ujboss1 and not ujboss2. > > I'll try to answer your questions : > the network setup is simple : ujboss1 ans ujboss2 are on the same lan, > without firewall between. Perhaps it's not what you was looking for? hmm ... looks really like a short split brain ... default timeouts in corosync are quite short at 1000ms for token lost ... try e.g. "token: 5000" in the "totem" section of your corosync config. Regards, Andreas -- Need help with Pacemaker? http://www.hastexo.com/now > > corosync.conf : > aisexec { > user: root > group: root > } > service { > name: pacemaker > ver: 0 > } > compatibility: whitetank > totem { > version: 2 > secauth: on > threads: 0 > interface { > ringnumber: 0 > bindnetaddr: 128.1.0.0 > mcastaddr: 226.94.1.1 > mcastport: 5405 > } > } > logging { > fileline: off > to_stderr: no > to_logfile: yes > to_syslog: no > logfile: /var/log/cluster/corosync.log > debug: off > timestamp: on > logger_subsys { > subsys: AMF > debug: off > } > } > amf { > mode: disabled > } > > and crm configure show : > node ujbossterrena1 \ > attributes standby="off" > node ujbossterrena2 \ > attributes standby="off" > primitive clusterIP ocf:heartbeat:IPaddr2 \ > params ip="128.1.13.9" cidr_netmask="32" iflabel="jbossfailover" \ > op monitor interval="30s" > primitive routing-jboss lsb:routing-jboss \ > op monitor interval="30s" > group vifGroup clusterIP routing-jboss \ > meta target-role="Started" > location prefer-clusterIP clusterIP 50: ujboss1 > property $id="cib-bootstrap-options" \ > dc-version="1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3" \ > cluster-infrastructure="openais" \ > expected-quorum-votes="2" \ > stonith-enabled="false" \ > no-quorum-policy="ignore" > rsc_defaults $id="rsc-options" \ > resource-stickiness="20" > > > and the previous log for ujboss2: > Jan 17 12:51:46 corosync [TOTEM ] A processor failed, forming new > configuration. > Jan 17 12:51:47 corosync [pcmk ] notice: pcmk_peer_update: Transitional > membership event on ring 564: memb=1, new=0, lost=1 > Jan 17 12:51:47 corosync [pcmk ] info: pcmk_peer_update: memb: > ujbossterrena2 51184000 > Jan 17 12:51:47 corosync [pcmk ] info: pcmk_peer_update: lost: > ujbossterrena1 34406784 > Jan 17 12:51:47 corosync [pcmk ] notice: pcmk_peer_update: Stable > membership event on ring 564: memb=1, new=0, lost=0 > Jan 17 12:51:47 corosync [pcmk ] info: pcmk_peer_update: MEMB: > ujbossterrena2 51184000 > Jan 17 12:51:47 corosync [pcmk ] info: ais_mark_unseen_peer_dead: Node > ujbossterrena1 was not seen in the previous transition > Jan 17 12:51:47 corosync [pcmk ] info: update_member: Node > 34406784/ujbossterrena1 is now: lost > Jan 17 12:51:47 corosync [pcmk ] info: send_member_notification: > Sending membership update 564 to 2 children > Jan 17 12:51:47 corosync [TOTEM ] A processor joined or left the > membership and a new membership was formed. > Jan 17 12:51:47 corosync [MAIN ] Completed service synchronization, > ready to provide service. > Jan 17 12:51:47 ujbossterrena2 crmd: [18796]: notice: ais_dispatch: > Membership 564: quorum lost > Jan 17 12:51:47 ujbossterrena2 crmd: [18796]: info: crm_update_peer: > Node ujbossterrena1: id=34406784 state=lost (new) addr=r( > 0) ip(128.1.13.2) votes=1 born=488 seen=556 > proc=00000000000000000000000000013312 > Jan 17 12:51:47 ujbossterrena2 crmd: [18796]: WARN: check_dead_member: > Our DC node (ujbossterrena1) left the cluster > Jan 17 12:51:47 ujbossterrena2 cib: [18792]: notice: ais_dispatch: > Membership 564: quorum lost > Jan 17 12:51:47 ujbossterrena2 cib: [18792]: info: crm_update_peer: Node > ujbossterrena1: id=34406784 state=lost (new) addr=r(0 > ) ip(128.1.13.2) votes=1 born=488 seen=556 > proc=00000000000000000000000000013312 > Jan 17 12:51:47 ujbossterrena2 crmd: [18796]: info: do_state_transition: > State transition S_NOT_DC -> S_ELECTION [ input=I_ELE > CTION cause=C_FSA_INTERNAL origin=check_dead_member ] > Jan 17 12:51:47 ujbossterrena2 crmd: [18796]: info: update_dc: Unset DC > ujbossterrena1 > Jan 17 12:51:47 ujbossterrena2 crmd: [18796]: info: do_state_transition: > State transition S_ELECTION -> S_INTEGRATION [ input= > I_ELECTION_DC cause=C_FSA_INTERNAL origin=do_election_check ] > Jan 17 12:51:47 ujbossterrena2 crmd: [18796]: info: do_te_control: > Registering TE UUID: 9a4e45a9-9436-4a23-b4e9-9875ba334906 > Jan 17 12:51:47 ujbossterrena2 crmd: [18796]: WARN: > cib_client_add_notify_callback: Callback already present > Jan 17 12:51:47 ujbossterrena2 crmd: [18796]: info: set_graph_functions: > Setting custom graph functions > Jan 17 12:51:47 ujbossterrena2 crmd: [18796]: info: unpack_graph: > Unpacked transition -1: 0 actions in 0 synapses > Jan 17 12:51:47 ujbossterrena2 crmd: [18796]: info: do_dc_takeover: > Taking over DC status for this partition > Jan 17 12:51:47 ujbossterrena2 cib: [18792]: info: > cib_process_readwrite: We are now in R/W mode > Jan 17 12:51:47 ujbossterrena2 cib: [18792]: info: cib_process_request: > Operation complete: op cib_master for section 'all' (o > rigin=local/crmd/73, version=0.233.5): ok (rc=0) > Jan 17 12:51:47 ujbossterrena2 cib: [18792]: info: cib_process_request: > Operation complete: op cib_modify for section cib (ori > gin=local/crmd/74, version=0.233.5): ok (rc=0) > Jan 17 12:51:47 ujbossterrena2 cib: [18792]: info: cib_process_request: > Operation complete: op cib_modify for section crm_conf > ig (origin=local/crmd/76, version=0.233.5): ok (rc=0) > Jan 17 12:51:47 ujbossterrena2 crmd: [18796]: info: join_make_offer: > Making join offers based on membership 564 > Jan 17 12:51:47 ujbossterrena2 cib: [18792]: info: cib_process_request: > Operation complete: op cib_modify for section crm_conf > ig (origin=local/crmd/78, version=0.233.5): ok (rc=0) > Jan 17 12:51:47 ujbossterrena2 crmd: [18796]: info: > do_dc_join_offer_all: join-3: Waiting on 1 outstanding join acks > Jan 17 12:51:47 ujbossterrena2 crmd: [18796]: info: ais_dispatch: > Membership 564: quorum still lost > Jan 17 12:51:47 ujbossterrena2 crmd: [18796]: info: crm_ais_dispatch: > Setting expected votes to 2 > Jan 17 12:51:47 ujbossterrena2 crmd: [18796]: info: > config_query_callback: Checking for expired actions every 900000ms > Jan 17 12:51:47 ujbossterrena2 crmd: [18796]: info: > config_query_callback: Sending expected-votes=2 to corosync > Jan 17 12:51:47 ujbossterrena2 cib: [18792]: info: cib_process_request: > Operation complete: op cib_modify for section crm_conf > ig (origin=local/crmd/81, version=0.233.5): ok (rc=0) > Jan 17 12:51:47 ujbossterrena2 crmd: [18796]: info: update_dc: Set DC to > ujbossterrena2 (3.0.1) > Jan 17 12:51:47 ujbossterrena2 crmd: [18796]: info: ais_dispatch: > Membership 564: quorum still lost > Jan 17 12:51:47 ujbossterrena2 crmd: [18796]: info: crm_ais_dispatch: > Setting expected votes to 2 > Jan 17 12:51:47 ujbossterrena2 cib: [18792]: info: cib_process_request: > Operation complete: op cib_modify for section crm_conf > ig (origin=local/crmd/84, version=0.233.5): ok (rc=0) > Jan 17 12:51:47 ujbossterrena2 crmd: [18796]: info: do_state_transition: > State transition S_INTEGRATION -> S_FINALIZE_JOIN [ i > nput=I_INTEGRATED cause=C_FSA_INTERNAL origin=check_join_state ] > Jan 17 12:51:47 ujbossterrena2 crmd: [18796]: info: do_state_transition: > All 1 cluster nodes responded to the join offer. > Jan 17 12:51:47 ujbossterrena2 crmd: [18796]: info: do_dc_join_finalize: > join-3: Syncing the CIB from ujbossterrena2 to the re > st of the cluster > Jan 17 12:51:47 ujbossterrena2 crmd: [18796]: info: te_connect_stonith: > Attempting connection to fencing daemon... > Jan 17 12:51:47 ujbossterrena2 cib: [18792]: info: cib_process_request: > Operation complete: op cib_sync for section 'all' (ori > gin=local/crmd/85, version=0.233.5): ok (rc=0) > Jan 17 12:51:48 ujbossterrena2 crmd: [18796]: info: te_connect_stonith: > Connected > Jan 17 12:51:48 ujbossterrena2 cib: [18792]: info: cib_process_request: > Operation complete: op cib_modify for section nodes (o > rigin=local/crmd/86, version=0.233.5): ok (rc=0) > Jan 17 12:51:48 corosync [pcmk ] notice: pcmk_peer_update: Transitional > membership event on ring 568: memb=1, new=0, lost=0 > Jan 17 12:51:48 corosync [pcmk ] info: pcmk_peer_update: memb: > ujbossterrena2 51184000 > Jan 17 12:51:48 corosync [pcmk ] notice: pcmk_peer_update: Stable > membership event on ring 568: memb=2, new=1, lost=0 > Jan 17 12:51:48 corosync [pcmk ] info: update_member: Node > 34406784/ujbossterrena1 is now: member > Jan 17 12:51:48 corosync [pcmk ] info: pcmk_peer_update: NEW: > ujbossterrena1 34406784 > Jan 17 12:51:48 corosync [pcmk ] info: pcmk_peer_update: MEMB: > ujbossterrena1 34406784 > Jan 17 12:51:48 corosync [pcmk ] info: pcmk_peer_update: MEMB: > ujbossterrena2 51184000 > Jan 17 12:51:48 corosync [pcmk ] info: send_member_notification: > Sending membership update 568 to 2 children > Jan 17 12:51:48 ujbossterrena2 crmd: [18796]: notice: ais_dispatch: > Membership 568: quorum acquired > Jan 17 12:51:48 ujbossterrena2 crmd: [18796]: info: ais_status_callback: > status: ujbossterrena1 is now member (was lost) > Jan 17 12:51:48 ujbossterrena2 crmd: [18796]: info: crm_update_peer: > Node ujbossterrena1: id=34406784 state=member (new) addr= > r(0) ip(128.1.13.2) votes=1 born=488 seen=568 > proc=00000000000000000000000000013312 > Jan 17 12:51:48 ujbossterrena2 cib: [18792]: notice: ais_dispatch: > Membership 568: quorum acquired > Jan 17 12:51:48 ujbossterrena2 cib: [18792]: info: crm_update_peer: Node > ujbossterrena1: id=34406784 state=member (new) addr=r > (0) ip(128.1.13.2) votes=1 born=488 seen=568 > proc=00000000000000000000000000013312 > Jan 17 12:51:48 ujbossterrena2 cib: [18792]: info: cib_process_request: > Operation complete: op cib_delete for section //node_s > tate[@uname='ujbossterrena1']/lrm (origin=local/crmd/87, > version=0.233.6): ok (rc=0) > Jan 17 12:51:48 corosync [TOTEM ] A processor joined or left the > membership and a new membership was formed. > Jan 17 12:51:48 ujbossterrena2 crmd: [18796]: info: crm_update_quorum: > Updating quorum status to true (call=91) > Jan 17 12:51:48 ujbossterrena2 cib: [18792]: info: cib_process_request: > Operation complete: op cib_delete for section //node_s > tate[@uname='ujbossterrena1']/transient_attributes > (origin=local/crmd/88, version=0.233.7): ok (rc=0) > Jan 17 12:51:48 ujbossterrena2 cib: [18792]: info: cib_process_request: > Operation complete: op cib_modify for section nodes (o > rigin=local/crmd/89, version=0.233.7): ok (rc=0) > Jan 17 12:51:48 ujbossterrena2 cib: [18792]: info: log_data_element: > cib:diff: - <cib dc-uuid="ujbossterrena1" admin_epoch="0" > epoch="233" num_updates="7" /> > Jan 17 12:51:48 ujbossterrena2 cib: [18792]: info: log_data_element: > cib:diff: + <cib dc-uuid="ujbossterrena2" admin_epoch="0" > epoch="234" num_updates="1" /> > Jan 17 12:51:48 ujbossterrena2 cib: [18792]: info: cib_process_request: > Operation complete: op cib_modify for section cib (ori > gin=local/crmd/91, version=0.234.1): ok (rc=0) > Jan 17 12:51:48 ujbossterrena2 crmd: [18796]: info: crm_ais_dispatch: > Setting expected votes to 2 > Jan 17 12:51:48 ujbossterrena2 crmd: [18796]: info: > erase_xpath_callback: Deletion of "//node_state[@uname='ujbossterrena1']/l > rm": ok (rc=0) > Jan 17 12:51:48 ujbossterrena2 crmd: [18796]: info: > erase_xpath_callback: Deletion of "//node_state[@uname='ujbossterrena1']/t > ransient_attributes": ok (rc=0) > Jan 17 12:51:48 ujbossterrena2 cib: [18792]: info: cib_process_request: > Operation complete: op cib_modify for section crm_conf > ig (origin=local/crmd/93, version=0.234.1): ok (rc=0) > Jan 17 12:51:48 ujbossterrena2 cib: [20766]: info: write_cib_contents: > Archived previous version as /var/lib/heartbeat/crm/cib > -88.raw > Jan 17 12:51:48 ujbossterrena2 cib: [20766]: info: write_cib_contents: > Wrote version 0.234.0 of the CIB to disk (digest: 62f91 > 5518fc318be93098daf9a5fecca) > Jan 17 12:51:48 ujbossterrena2 cib: [20766]: info: retrieveCib: Reading > cluster configuration from: /var/lib/heartbeat/crm/cib > .4oDZVT (digest: /var/lib/heartbeat/crm/cib.i5iINe) > Jan 17 12:51:48 ujbossterrena2 cib: [18792]: WARN: cib_process_diff: > Diff 0.233.5 -> 0.234.1 not applied to 0.234.1: current " > epoch" is greater than required > Jan 17 12:51:48 ujbossterrena2 crmd: [18796]: info: do_lrm_rsc_op: > Performing key=8:8777:0:39671e48-9519-4b61-b781-2efcd379df7 > a op=clusterIP_start_0 ) > Jan 17 12:51:48 ujbossterrena2 crmd: [18796]: info: do_lrm_rsc_op: > Discarding attempt to perform action start on clusterIP in > state S_FINALIZE_JOIN > Jan 17 12:51:48 ujbossterrena2 crmd: [18796]: info: send_direct_ack: > ACK'ing resource op clusterIP_start_0 from 8:8777:0:39671 > e48-9519-4b61-b781-2efcd379df7a: lrm_invoke-lrmd-1326801108-25 > Jan 17 12:51:48 ujbossterrena2 crmd: [18796]: info: process_te_message: > Processing (N)ACK lrm_invoke-lrmd-1326801108-25 from u > jbossterrena2 > Jan 17 12:51:48 ujbossterrena2 crmd: [18796]: info: process_graph_event: > Action clusterIP_start_0/8 (4:99;8:8777:0:39671e48-95 > 19-4b61-b781-2efcd379df7a) initiated by a different transitioner > Jan 17 12:51:48 ujbossterrena2 crmd: [18796]: info: > abort_transition_graph: process_graph_event:456 - Triggered transition abo > rt (complete=1, tag=lrm_rsc_op, id=clusterIP_start_0, > magic=4:99;8:8777:0:39671e48-9519-4b61-b781-2efcd379df7a) : Foreign even > t > Jan 17 12:51:48 ujbossterrena2 crmd: [18796]: info: update_dc: Unset DC > ujbossterrena2 > Jan 17 12:51:48 ujbossterrena2 crmd: [18796]: WARN: do_log: FSA: Input > I_RELEASE_DC from do_election_count_vote() received in > state S_FINALIZE_JOIN > Jan 17 12:51:48 ujbossterrena2 crmd: [18796]: info: do_state_transition: > State transition S_FINALIZE_JOIN -> S_RELEASE_DC [ in > put=I_RELEASE_DC cause=C_FSA_INTERNAL origin=do_election_count_vote ] > Jan 17 12:51:48 ujbossterrena2 crmd: [18796]: info: do_dc_release: DC > role released > Jan 17 12:51:48 ujbossterrena2 crmd: [18796]: info: > pe_connection_destroy: Connection to the Policy Engine released > Jan 17 12:51:48 ujbossterrena2 crmd: [18796]: info: do_te_control: > Transitioner is now inactive > Jan 17 12:51:48 ujbossterrena2 crmd: [18796]: info: do_te_control: > Disconnecting STONITH... > Jan 17 12:51:48 ujbossterrena2 crmd: [18796]: info: > tengine_stonith_connection_destroy: Fencing daemon disconnected > Jan 17 12:51:48 ujbossterrena2 crmd: [18796]: notice: Not currently > connected. > Jan 17 12:51:48 ujbossterrena2 crmd: [18796]: info: do_state_transition: > State transition S_RELEASE_DC -> S_PENDING [ input=I_ > RELEASE_SUCCESS cause=C_FSA_INTERNAL origin=do_dc_release ] > Jan 17 12:51:48 ujbossterrena2 cib: [18792]: info: > cib_process_readwrite: We are now in R/O mode > Jan 17 12:51:49 corosync [MAIN ] Completed service synchronization, > ready to provide service. > Jan 17 12:51:49 ujbossterrena2 crmd: [18796]: info: update_dc: Set DC to > ujbossterrena1 (3.0.1) > Jan 17 12:51:49 ujbossterrena2 crmd: [18796]: info: update_dc: Unset DC > ujbossterrena1 > Jan 17 12:51:49 ujbossterrena2 crmd: [18796]: info: update_dc: Set DC to > ujbossterrena1 (3.0.1) > Jan 17 12:51:49 ujbossterrena2 crmd: [18796]: info: update_dc: Unset DC > ujbossterrena1 > Jan 17 12:51:49 ujbossterrena2 crmd: [18796]: info: update_dc: Set DC to > ujbossterrena1 (3.0.1) > > Best regards, > Philippe > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org