On Thu, Mar 10, 2011 at 9:10 PM, Bart Coninckx <bart.conin...@telenet.be> wrote: > Hi all, > > I have a three node cluster and while introducing the third node, it > remains offline no matter what I do.
Nothing you've shown here seems to indicate its offline - what leads you to that conclusion? > Another symptom is that stopping > openais takes forever on that node, while it is waiting for crmd to unload. > > The logfile shows this node (xen3) to be online however: > > Mar 10 20:55:26 corosync [pcmk ] info: pcmk_ipc: Recorded connection > 0x6987c0 for attrd/10120 > Mar 10 20:55:26 corosync [pcmk ] info: pcmk_ipc: Recorded connection > 0x69cb20 for cib/10118 > Mar 10 20:55:26 corosync [pcmk ] info: pcmk_ipc: Sending membership > update 4100 to cib > Mar 10 20:55:26 corosync [CLM ] CLM CONFIGURATION CHANGE > Mar 10 20:55:26 corosync [CLM ] New Configuration: > Mar 10 20:55:26 corosync [CLM ] r(0) ip(10.0.1.13) r(1) > ip(10.0.2.13) > Mar 10 20:55:26 corosync [CLM ] Members Left: > Mar 10 20:55:26 corosync [CLM ] Members Joined: > Mar 10 20:55:26 corosync [pcmk ] notice: pcmk_peer_update: Transitional > membership event on ring 4104: memb=1, new=0, lost=0 > Mar 10 20:55:26 corosync [pcmk ] info: pcmk_peer_update: memb: xen3 > 218169354 > Mar 10 20:55:26 corosync [CLM ] CLM CONFIGURATION CHANGE > Mar 10 20:55:26 corosync [CLM ] New Configuration: > Mar 10 20:55:26 corosync [CLM ] r(0) ip(10.0.1.11) r(1) > ip(10.0.2.11) > Mar 10 20:55:26 corosync [CLM ] r(0) ip(10.0.1.12) r(1) > ip(10.0.2.12) > Mar 10 20:55:26 corosync [CLM ] r(0) ip(10.0.1.13) r(1) > ip(10.0.2.13) > Mar 10 20:55:26 corosync [CLM ] Members Left: > Mar 10 20:55:26 corosync [CLM ] Members Joined: > Mar 10 20:55:26 corosync [CLM ] r(0) ip(10.0.1.11) r(1) > ip(10.0.2.11) > Mar 10 20:55:26 corosync [CLM ] r(0) ip(10.0.1.12) r(1) > ip(10.0.2.12) > Mar 10 20:55:26 corosync [pcmk ] notice: pcmk_peer_update: Stable > membership event on ring 4104: memb=3, new=2, lost=0 > Mar 10 20:55:26 corosync [pcmk ] info: update_member: Creating entry > for node 184614922 born on 4104 > Mar 10 20:55:26 corosync [pcmk ] info: update_member: Node > 184614922/unknown is now: member > Mar 10 20:55:26 corosync [pcmk ] info: pcmk_peer_update: NEW: > .pending. 184614922 > Mar 10 20:55:26 corosync [pcmk ] info: update_member: Creating entry > for node 201392138 born on 4104 > Mar 10 20:55:26 corosync [pcmk ] info: update_member: Node > 201392138/unknown is now: member > Mar 10 20:55:26 corosync [pcmk ] info: pcmk_peer_update: NEW: > .pending. 201392138 > Mar 10 20:55:26 corosync [pcmk ] info: pcmk_peer_update: MEMB: > .pending. 184614922 > Mar 10 20:55:26 corosync [pcmk ] info: pcmk_peer_update: MEMB: > .pending. 201392138 > Mar 10 20:55:26 corosync [pcmk ] info: pcmk_peer_update: MEMB: xen3 > 218169354 > Mar 10 20:55:26 corosync [pcmk ] info: send_member_notification: > Sending membership update 4104 to 1 children > Mar 10 20:55:26 corosync [pcmk ] info: update_member: 0x7f4268000c80 > Node 218169354 ((null)) born on: 4104 > Mar 10 20:55:26 corosync [TOTEM ] A processor joined or left the > membership and a new membership was formed. > Mar 10 20:55:26 corosync [pcmk ] info: update_member: 0x7f4268001120 > Node 201392138 (xen2) born on: 3800 > Mar 10 20:55:26 corosync [pcmk ] info: update_member: 0x7f4268001120 > Node 201392138 now known as xen2 (was: (null)) > Mar 10 20:55:26 corosync [pcmk ] info: update_member: Node xen2 now has > process list: 00000000000000000000000000151312 (1381138) > Mar 10 20:55:26 corosync [pcmk ] info: update_member: Node xen2 now has > 1 quorum votes (was 0) > Mar 10 20:55:26 corosync [pcmk ] info: send_member_notification: > Sending membership update 4104 to 1 children > Mar 10 20:55:26 corosync [pcmk ] WARN: route_ais_message: Sending > message to local.crmd failed: ipc delivery failed (rc=-2) > Mar 10 20:55:26 xen3 cib: [10118]: notice: ais_dispatch_message: > Membership 4104: quorum acquired > Mar 10 20:55:26 corosync [pcmk ] info: update_member: 0x7f4268000aa0 > Node 184614922 (xen1) born on: 3792 > Mar 10 20:55:26 corosync [pcmk ] info: update_member: 0x7f4268000aa0 > Node 184614922 now known as xen1 (was: (null)) > Mar 10 20:55:26 corosync [pcmk ] info: update_member: Node xen1 now has > process list: 00000000000000000000000000151312 (1381138) > Mar 10 20:55:26 corosync [pcmk ] info: update_member: Node xen1 now has > 1 quorum votes (was 0) > Mar 10 20:55:26 corosync [pcmk ] info: update_expected_votes: Expected > quorum votes 2 -> 3 > Mar 10 20:55:26 corosync [pcmk ] info: send_member_notification: > Sending membership update 4104 to 1 children > Mar 10 20:55:26 corosync [pcmk ] WARN: route_ais_message: Sending > message to local.crmd failed: ipc delivery failed (rc=-2) > Mar 10 20:55:26 corosync [TOTEM ] Marking ringid 1 interface 10.0.2.13 > FAULTY - adminisrtative intervention required. > Mar 10 20:55:26 corosync [pcmk ] WARN: route_ais_message: Sending > message to local.crmd failed: ipc delivery failed (rc=-2) > Mar 10 20:55:26 xen3 cib: [10118]: WARN: cib_diff_notify: Local-only > Change (client:crmd, call: 1742): -1.-1.-1 (Application of an update > diff failed, requesting a full refresh) > Mar 10 20:55:27 corosync [pcmk ] info: pcmk_ipc: Recorded connection > 0x7f4268002040 for crmd/10122 > Mar 10 20:55:27 corosync [pcmk ] info: pcmk_ipc: Sending membership > update 4104 to crmd > Mar 10 20:55:27 xen3 crmd: [10122]: notice: ais_dispatch_message: > Membership 4104: quorum acquired > Mar 10 20:55:27 xen3 crmd: [10122]: notice: crmd_peer_update: Status > update: Client xen3/crmd now has status [online] (DC=<null>) > Mar 10 20:55:27 corosync [MAIN ] Completed service synchronization, > ready to provide service. > Mar 10 20:55:27 xen3 cib: [10118]: WARN: cib_server_process_diff: Not > applying diff 0.1672.12 -> 0.1672.13 (sync in progress) > Mar 10 20:55:27 xen3 mgmtd: [10123]: debug: main: run the loop... > Mar 10 20:55:27 xen3 mgmtd: [10123]: info: Started. > Mar 10 20:55:27 xen3 lrmd: [10119]: info: setting max-children to 4 > > > ps afx shows all relevant processes in a normal state though: > > 10111 ? Ssl 0:00 /usr/sbin/corosync > 10117 ? S 0:00 \_ /usr/lib64/heartbeat/stonithd > 10118 ? S 0:00 \_ /usr/lib64/heartbeat/cib > 10119 ? S 0:00 \_ /usr/lib64/heartbeat/lrmd > 10120 ? S 0:00 \_ /usr/lib64/heartbeat/attrd > 10121 ? S 0:00 \_ /usr/lib64/heartbeat/pengine > 10122 ? S 0:00 \_ /usr/lib64/heartbeat/crmd > 10123 ? S 0:00 \_ /usr/lib64/heartbeat/mgmtd > > > I tried to remove the node with crm_node -R= to no avail. > > The used versions are : > > corosync-1.2.6-0.2.2 > openais-1.1.3-0.2.3 > pacemaker-1.1.2-0.7.1 > > corosync.conf looks like this: > > aisexec { > group: root > user: root > } > service { > use_mgmtd: yes > ver: 0 > name: pacemaker > } > totem { > rrp_mode: passive > token_retransmits_before_loss_const: 10 > join: 1000 > max_messages: 20 > vsftype: none > token: 5000 > consensus: 7500 > secauth: off > version: 2 > > interface { > bindnetaddr: 10.0.1.0 > mcastaddr: 226.94.1.1 > mcastport: 5405 > ringnumber: 0 > > } > interface { > bindnetaddr: 10.0.2.0 > mcastaddr: 226.84.2.1 > mcastport: 5406 > ringnumber: 1 > } > clear_node_high_bit: yes > } > logging { > to_logfile: yes > logfile: /var/log/ha-log > timestamp: on > syslog_facility: daemon > to_syslog: no > debug: on > to_stderr: yes > fileline: off > > } > amf { > mode: disable > } > > > Does anyone have any suggestions on how to proceed? > > Thank you!! > > B. > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker