Le 28/10/2010 18:30, Guillaume Chanaud a écrit :
Le 28/10/2010 17:55, Pavlos Parissis a écrit :
On 28 October 2010 16:09, Guillaume Chanaud
<guillaume.chan...@connecting-nature.com> wrote:
Hello,
i have a cluster of two master/slave drbd server running into a vlan
(machines are dedicated servers)
(filer1 and filer2)
I added a third node to the cluster (a "blank node" for the moment)
correctly
(server1)
When i add a 4th node to the cluster (which is a "mirror" of server1)
(server2)
this node start as standalone...Here is the message.log :
Oct 28 15:59:27 ns209045 corosync[16543]: [TOTEM ] A processor
joined or
left the membership and a new membership was formed.
Oct 28 15:59:28 ns209045 corosync[16543]: [pcmk ] notice:
pcmk_peer_update: Transitional membership event on ring 945392: memb=1,
new=0, lost=0
Oct 28 15:59:28 ns209045 corosync[16543]: [pcmk ] info:
pcmk_peer_update:
memb: server2 16820416
Oct 28 15:59:28 ns209045 corosync[16543]: [pcmk ] notice:
pcmk_peer_update: Stable membership event on ring 945392: memb=1,
new=0,
lost=0
Oct 28 15:59:28 ns209045 corosync[16543]: [pcmk ] info:
pcmk_peer_update:
MEMB: server2 16820416
Oct 28 15:59:28 ns209045 corosync[16543]: [TOTEM ] A processor
joined or
left the membership and a new membership was formed.
Oct 28 15:59:29 ns209045 corosync[16543]: [pcmk ] notice:
pcmk_peer_update: Transitional membership event on ring 945416: memb=1,
new=0, lost=0
Oct 28 15:59:29 ns209045 corosync[16543]: [pcmk ] info:
pcmk_peer_update:
memb: server2 16820416
Oct 28 15:59:29 ns209045 corosync[16543]: [pcmk ] notice:
pcmk_peer_update: Stable membership event on ring 945416: memb=1,
new=0,
lost=0
Oct 28 15:59:29 ns209045 corosync[16543]: [pcmk ] info:
pcmk_peer_update:
MEMB: server2 16820416
[...] Message repeat many many times
Now i stop the server1, and i start the server2...server2 start
correctly
and is added to the cluster...but when
i want to start server1, same thing happens...(so things are
inverted but
result is the same...when i start one the serverX, the other can't
start...)
My corosync.conf is configured in broadcast, not multicast....I have
lots of
problem with multicast because lots of briged VM on the vlan
doesn't see the multicast packets, or doesn't join the multicast group
correctly...
Any hint on this ??
corosync and auth files are the same on server2?
Yes of course :D (copied by scp), as i told server1 can join when
server2 is offline, and server 2 can join when server1 is offline, but
if one is online, the other can't join and log the above things in
loop...
In fact i have loooooooottttttssssss of problem with
corosync/pacemaker...multicast/broadcast between physical
servers/virtual....lots of different shit everywhere, error log are
always different depending on what i try...
The strange things is that the filer1 filer2 server2 and server1 are
all running the same distro (gentoo) with same tools and are on the
same vlan (which is working for lots of services like nfs...)
Another things i've just seen...
When one of the server1/server2 connect to the cluster, log start to
fill with this message on all nodes :
ct 28 18:46:01 filer2 corosync[10928]: [TOTEM ] A processor joined or
left the membership and a new membership was formed.
Oct 28 18:46:01 filer2 corosync[10928]: [MAIN ] Completed service
synchronization, ready to provide service.
Oct 28 18:46:01 filer2 corosync[10928]: [TOTEM ] Process pause
detected for 513 ms, flushing membership messages.
Oct 28 18:46:01 filer2 corosync[10928]: [TOTEM ] Process pause
detected for 513 ms, flushing membership messages.
Oct 28 18:46:01 filer2 corosync[10928]: [TOTEM ] Process pause
detected for 513 ms, flushing membership messages.
Oct 28 18:46:01 filer2 corosync[10928]: [TOTEM ] Process pause
detected for 513 ms, flushing membership messages.
Oct 28 18:46:01 filer2 corosync[10928]: [TOTEM ] Process pause
detected for 572 ms, flushing membership messages.
Oct 28 18:46:01 filer2 corosync[10928]: [TOTEM ] Process pause
detected for 572 ms, flushing membership messages.
Oct 28 18:46:01 filer2 corosync[10928]: [TOTEM ] Process pause
detected for 573 ms, flushing membership messages.
Oct 28 18:46:01 filer2 corosync[10928]: [pcmk ] notice:
pcmk_peer_update: Transitional membership event on ring 1162480: memb=3,
new=0, lost=0
Oct 28 18:46:01 filer2 corosync[10928]: [pcmk ] info:
pcmk_peer_update: memb: server1 16820416
Oct 28 18:46:01 filer2 corosync[10928]: [pcmk ] info:
pcmk_peer_update: memb: filer1 83929280
Oct 28 18:46:01 filer2 corosync[10928]: [pcmk ] info:
pcmk_peer_update: memb: filer2 100706496
Oct 28 18:46:01 filer2 corosync[10928]: [pcmk ] notice:
pcmk_peer_update: Stable membership event on ring 1162480: memb=3,
new=0, lost=0
Oct 28 18:46:01 filer2 corosync[10928]: [pcmk ] info:
pcmk_peer_update: MEMB: server1 16820416
Oct 28 18:46:01 filer2 corosync[10928]: [pcmk ] info:
pcmk_peer_update: MEMB: filer1 83929280
Oct 28 18:46:01 filer2 corosync[10928]: [pcmk ] info:
pcmk_peer_update: MEMB: filer2 100706496
Oct 28 18:46:01 filer2 corosync[10928]: [TOTEM ] A processor joined or
left the membership and a new membership was formed.
Oct 28 18:46:01 filer2 corosync[10928]: [MAIN ] Completed service
synchronization, ready to provide service.
Oct 28 18:46:01 filer2 corosync[10928]: [TOTEM ] Process pause
detected for 632 ms, flushing membership messages.
Oct 28 18:46:01 filer2 corosync[10928]: [TOTEM ] Process pause
detected for 632 ms, flushing membership messages.
Oct 28 18:46:01 filer2 corosync[10928]: [TOTEM ] Process pause
detected for 632 ms, flushing membership messages.
Oct 28 18:46:01 filer2 corosync[10928]: [TOTEM ] Process pause
detected for 692 ms, flushing membership messages.
Oct 28 18:46:01 filer2 corosync[10928]: [TOTEM ] Process pause
detected for 692 ms, flushing membership messages.
Oct 28 18:46:01 filer2 corosync[10928]: [TOTEM ] Process pause
detected for 751 ms, flushing membership messages.
Oct 28 18:46:01 filer2 corosync[10928]: [TOTEM ] Process pause
detected for 751 ms, flushing membership messages.
which is not the case when filer1/filer2 are the only nodes of the
cluster...
_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs:
http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker