date:20091216

[Linux-HA] Question about openais rrp_mode

2009-12-16 Thread Alain.Moulle

Hi,
> rrp_mode
>   This specifies the mode of redundant ring, which may  
> be  none,
>   active,  or  passive.  Active replication offers 
> slightly lower
>   latency from transmit to delivery in  faulty  network  
> environ-
>   ments  but  with  less  performance.   Passive  
> replication may
>   nearly double the speed of the totem protocol if  the  
> protocol
>   doesn't  become  cpu bound.
Not completely clear for me:  does that mean that "active mode" makes it
send the totems systematically on both networks, and "passive mode" makes
it send on the first interface ringnumber (in openais.conf) and only on
the second interface rignnumber if the first is broken ?
Could someone give more precise information ?
or where can I find more information about this ?

And by the way, is there any issue to use to set the first interface 
ringnumber
on Ethernet (eth0) and the second on IP/Infiniband ?

Thanks for your response.
Alain Moullé


___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Question about risk of split-brain and risk of dual-fencing

2009-12-16 Thread Andrew Beekhof

On Wed, Dec 16, 2009 at 8:56 AM, Alain.Moulle  wrote:
>> No. It is a split-brain situation as soon as nodes can't
>> communicate.
>>
> Ok, you're rigtht, in fact,  I wanted to talk about the risk of shared
> resources mounted on both sides, which
> is in fact the worst thing that could happen in case of "split-brain" if
> no fencing occurs .

Thats why most vendors will not support configurations without fencing
configured.
If you care about your data, you need fencing.

>>
>>> > And if we have a more than two-nodes cluster, it seems similar for me ...
>>>
>>
>> No, because the partition without quorum can't fence nodes. That
>> makes things simpler and more predictable.
>>
> ... what if no-quorum-policy=ignore ?

Then you get what you ask for :-)

[snip]
> try to
> get the configuration
> which avoids for sure dual-fencing, and also  avoids shared resources
> mounted on both sides,
> that's what I'm trying to find with Pacemaker & openais.

I would recommend one of two approaches.  Either have stonith use the
poweroff method or don't start the cluster software automatically when
the node boots.
Also, have a read of Tim's stonith doc: http://ourobengr.com/ha
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Question about openais rrp_mode

2009-12-16 Thread Andrew Beekhof

perhaps try the openais mailing list rather than their competitor ;-)

On Wed, Dec 16, 2009 at 9:18 AM, Alain.Moulle  wrote:
> Hi,
>> rrp_mode
>>               This specifies the mode of redundant ring, which may
>> be  none,
>>               active,  or  passive.  Active replication offers
>> slightly lower
>>               latency from transmit to delivery in  faulty  network
>> environ-
>>               ments  but  with  less  performance.   Passive
>> replication may
>>               nearly double the speed of the totem protocol if  the
>> protocol
>>               doesn't  become  cpu bound.
> Not completely clear for me:  does that mean that "active mode" makes it
> send the totems systematically on both networks, and "passive mode" makes
> it send on the first interface ringnumber (in openais.conf) and only on
> the second interface rignnumber if the first is broken ?
> Could someone give more precise information ?
> or where can I find more information about this ?
>
> And by the way, is there any issue to use to set the first interface
> ringnumber
> on Ethernet (eth0) and the second on IP/Infiniband ?
>
> Thanks for your response.
> Alain Moullé
>
>
> ___
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Switching after reboot

2009-12-16 Thread Andrew Beekhof

On Wed, Dec 16, 2009 at 8:48 AM, artur.k  wrote:
> I have built a cluster with two nodes on pacemacker 1.0.4 + DRBD (8.0.14). If 
> one machine is restarted after returning pacemacker trying to switch all 
> services to this server. How to prevent it?

Set default-resource-stickiness to something higher than 200.
If you don't want it to move under any circumstances, set it to INFINITY.

You should also use the linbit drbd agent if possible.

>
> node test-storage-1
> node test-storage-2
> primitive drbd0 ocf:heartbeat:drbd \
>        params drbd_resource="r0" \
>        op monitor interval="59s" role="Master" timeout="30s" \
>        op monitor interval="60s" role="Slave" timeout="30s" \
>        op start interval="0" timeout="20s" \
>        op stop interval="0" timeout="20s"
> primitive fs0 ocf:heartbeat:Filesystem \
>        params fstype="xfs" directory="/mnt/drbd0" device="/dev/drbd0" \
>        params options="rw,nosuid,noatime" \
>        op monitor interval="21s" timeout="20s" \
>        op start interval="0" timeout="60s" \
>        op stop interval="0" timeout="20s"
> primitive ip ocf:heartbeat:IPaddr2 \
>        params ip="10.1.x.x" nic="eth1" cidr_netmask="24" \
>        op monitor interval="21s" timeout="5s"
> primitive nfs-common lsb:nfs-common \
>        op monitor interval="21s" timeout="5s"
> primitive nfs-kernel-server lsb:nfs-kernel-server \
>        op monitor interval="21s" timeout="5s" \
>        op start interval="0" timeout="180s"
> group storage fs0 nfs-kernel-server ip nfs-common
> ms ms-drbd0 drbd0 \
>        meta clone-max="2" notify="true" globally-unique="false" 
> target-role="Started"
> colocation storage-on-ms-drbd0 inf: storage ms-drbd0:Master
> order ms-drbd0-before-storage inf: ms-drbd0:promote storage:start
> property $id="cib-bootstrap-options" \
>        dc-version="1.0.4-2ec1d189f9c23093bf9239a980534b661baf782d" \
>        cluster-infrastructure="openais" \
>        expected-quorum-votes="2" \
>        no-quorum-policy="ignore" \
>        stonith-enabled="false" \
>        default-resource-stickiness="200"
>
> ___
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[Linux-HA] crm_attribute syntax

2009-12-16 Thread 李森


-- 
在线游戏技术部网管组
李森(Jason)
POPO :listen1...@163.com
Email:li...@corp.netease.com


___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Help required to develop OCF Resource Agent Script for Master-Slave script

2009-12-16 Thread Jessy




Andrew Beekhof-3 wrote:
> 
> On Wed, Dec 2, 2009 at 5:23 AM, Jessy  wrote:
>>> Yes, but did you add a monitor action to the resource's definition in
>>> the configuration?
>>>
>>> [Jessy] : I have added monitor operation defination in cib.xml with
>>> certain interval time in cib.xml file as below.
>>> >> role="Master"/>
>>> >> role="Slave"/>
>>>
>>> More over, i've also added the defination of monitor action in the
>>> meta-data of RA 'MaSlApp' as follows:
>>>    
>>>    
>>>    
>>>    
>>>    
>>>    
>>>    >> start-delay="50" role="Slave"/>
>>>    >> start-delay="30" role="Master"/>
>>>    
>>>    
>>>    
>>>
>>> Thanks in advance!!!
> 
> Ok, and what happened?
> Did you also upgrade?
> ___
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Help-required-to-develop-OCF-Resource-Agent-Script-for-Master-Slave-script-tp26472940p26806893.html
Sent from the Linux-HA mailing list archive at Nabble.com.

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] SLES 11 cluster members won't communicate

2009-12-16 Thread Dejan Muhamedagic

Hi,

On Tue, Dec 15, 2009 at 09:52:14AM -0600, justin.kin...@academy.com wrote:
> Hello everyone.
> 
> I'm configuring a new 2 node cluster using SLES11 and the HAE using 
> openais 0.80.3-26.1 and pacemaker 1.0.3-4.1
> 
> The problem I'm having is that the nodes do not seem to find each other as 
> the documentation says they should. 
> 
> Here's a brief rundown of what I've done:
> 
> 1. configured the two nodes using ip addresses 10.1.254.166 and 
> 10.1.254.169.
> 2. installed the ha_sles pattern
> 3. updated the following lines in /etc/ais/openais.conf:
> bindnetaddr:10.1.254.0
> mcastaddr:  239.252.10.10
> mcastport:  5405
> 4. opened udp/5405 in the firewall
> 5. generated /etc/ais/authkey using ais-keygen and copied to second node
> 6. start openais using rcopenais start
> 
> Here are my questions:
> 
> 1. How long should I expect to wait before seeing the CLM messages 
> indicating the nodes joining the cluster?  Initially, I waited a few 
> minutes and assumed something was wrong because I never saw these 
> messages.  But last night, the following appeared in the log:
> 
> Dec 14 18:31:59 plccedir03 openais[3739]: [CLM  ] Members Left:
> Dec 14 18:31:59 plccedir03 openais[3739]: [CLM  ] Members Joined:
> Dec 14 18:31:59 plccedir03 openais[3739]: [CLM  ]   r(0) 
> ip(10.1.254.169) 
> Dec 14 18:31:59 plccedir03 openais[3739]: [CLM  ] got nodejoin message 
> 10.1.254.166
> Dec 14 18:31:59 plccedir03 openais[3739]: [CLM  ] got nodejoin message 
> 10.1.254.169

That looks good. Are you sure that a) there's really no firewall
involvement and b) your network switch can handle multicast?

> 2. Using the GUI, the other node never shows online.  The node where 
> crm_gui is being run from shows online, but the other one never goes 
> green.
> 
> 3. After a restart of openais this morning, I have not yet 
> 
> I've included the messages from a shutdown/startup of openais this 
> morning.

Nothing much in the logs, except that nodes don't form a cluster.
Check if they really communicate using tcpdump or wireshark.
There's also openais-cfgtool which may display ring status.

Thanks,

Dejan


> Thanks in advance,
> Justin
> 
> 
> 



> ___
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] SLES 11 cluster members won't communicate

2009-12-16 Thread justin . kinney

> > I'm configuring a new 2 node cluster using SLES11 and the HAE using
> > openais 0.80.3-26.1 and pacemaker 1.0.3-4.1
> >
> > The problem I'm having is that the nodes do not seem to find each 
other as
> > the documentation says they should.
> >
> > Here's a brief rundown of what I've done:
> >
> > 1. configured the two nodes using ip addresses 10.1.254.166 and
> > 10.1.254.169.
> > 2. installed the ha_sles pattern
> > 3. updated the following lines in /etc/ais/openais.conf:
> > bindnetaddr:10.1.254.0
> > mcastaddr:  239.252.10.10
> > mcastport:  5405
> > 4. opened udp/5405 in the firewall
> > 5. generated /etc/ais/authkey using ais-keygen and copied to second 
node
> > 6. start openais using rcopenais start
> >
> > Here are my questions:
> >
> > 1. How long should I expect to wait before seeing the CLM messages
> > indicating the nodes joining the cluster?  Initially, I waited a few
> > minutes and assumed something was wrong because I never saw these
> > messages.  But last night, the following appeared in the log:
> >
> > Dec 14 18:31:59 plccedir03 openais[3739]: [CLM  ] Members Left:
> > Dec 14 18:31:59 plccedir03 openais[3739]: [CLM  ] Members Joined:
> > Dec 14 18:31:59 plccedir03 openais[3739]: [CLM  ]   r(0)
> > ip(10.1.254.169)
> > Dec 14 18:31:59 plccedir03 openais[3739]: [CLM  ] got nodejoin message
> > 10.1.254.166
> > Dec 14 18:31:59 plccedir03 openais[3739]: [CLM  ] got nodejoin message
> > 10.1.254.169

> That looks good. Are you sure that a) there's really no firewall
> involvement and b) your network switch can handle multicast?

In my latest attempts, I've cleared all iptables rules to be sure that 
wasn't an issue.  There is no other firewall between these boxes.

I will pursue the possibility of no multicast support, although our 
network engineers have told me it is enabled on our switches. 

> > 2. Using the GUI, the other node never shows online.  The node where
> > crm_gui is being run from shows online, but the other one never goes
> > green.
> >
> > 3. After a restart of openais this morning, I have not yet
> >
> > I've included the messages from a shutdown/startup of openais this
> > morning.

> Nothing much in the logs, except that nodes don't form a cluster.
> Check if they really communicate using tcpdump or wireshark.
> There's also openais-cfgtool which may display ring status.

I've captured some packets using tcpdump, and indeed, I never see the 
multicast traffic being received, only sent.  The odd thing is that these 
machines respond to other multicast traffic, like pinging 224.0.0.1.

Is there a kernel option that anyone is aware of that could be causing the 
boxes to drop multicast?

Thanks,
Justin

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] SLES 11 cluster members won't communicate

2009-12-16 Thread Michael Schwartzkopff

Am Mittwoch, 16. Dezember 2009 14:43:33 schrieb justin.kin...@academy.com:
> > > I'm configuring a new 2 node cluster using SLES11 and the HAE using
> > > openais 0.80.3-26.1 and pacemaker 1.0.3-4.1
> > >
> > > The problem I'm having is that the nodes do not seem to find each
>
> other as
>
> > > the documentation says they should.
> > >
> > > Here's a brief rundown of what I've done:
> > >
> > > 1. configured the two nodes using ip addresses 10.1.254.166 and
> > > 10.1.254.169.
> > > 2. installed the ha_sles pattern
> > > 3. updated the following lines in /etc/ais/openais.conf:
> > > bindnetaddr:10.1.254.0
> > > mcastaddr:  239.252.10.10
> > > mcastport:  5405
> > > 4. opened udp/5405 in the firewall
> > > 5. generated /etc/ais/authkey using ais-keygen and copied to second
>
> node
>
> > > 6. start openais using rcopenais start
> > >
> > > Here are my questions:
> > >
> > > 1. How long should I expect to wait before seeing the CLM messages
> > > indicating the nodes joining the cluster?  Initially, I waited a few
> > > minutes and assumed something was wrong because I never saw these
> > > messages.  But last night, the following appeared in the log:
> > >
> > > Dec 14 18:31:59 plccedir03 openais[3739]: [CLM  ] Members Left:
> > > Dec 14 18:31:59 plccedir03 openais[3739]: [CLM  ] Members Joined:
> > > Dec 14 18:31:59 plccedir03 openais[3739]: [CLM  ]   r(0)
> > > ip(10.1.254.169)
> > > Dec 14 18:31:59 plccedir03 openais[3739]: [CLM  ] got nodejoin message
> > > 10.1.254.166
> > > Dec 14 18:31:59 plccedir03 openais[3739]: [CLM  ] got nodejoin message
> > > 10.1.254.169
> >
> > That looks good. Are you sure that a) there's really no firewall
> > involvement and b) your network switch can handle multicast?
>
> In my latest attempts, I've cleared all iptables rules to be sure that
> wasn't an issue.  There is no other firewall between these boxes.
>
> I will pursue the possibility of no multicast support, although our
> network engineers have told me it is enabled on our switches.
>
> > > 2. Using the GUI, the other node never shows online.  The node where
> > > crm_gui is being run from shows online, but the other one never goes
> > > green.
> > >
> > > 3. After a restart of openais this morning, I have not yet
> > >
> > > I've included the messages from a shutdown/startup of openais this
> > > morning.
> >
> > Nothing much in the logs, except that nodes don't form a cluster.
> > Check if they really communicate using tcpdump or wireshark.
> > There's also openais-cfgtool which may display ring status.
>
> I've captured some packets using tcpdump, and indeed, I never see the
> multicast traffic being received, only sent.  The odd thing is that these
> machines respond to other multicast traffic, like pinging 224.0.0.1.
>
> Is there a kernel option that anyone is aware of that could be causing the
> boxes to drop multicast?
>
> Thanks,
> Justin

I'd have a serious word with your network guys.

Show them your tcpdumps and they hopefully will understand.

Greetings,

-- 
Dr. Michael Schwartzkopff
MultiNET Services GmbH
Addresse: Bretonischer Ring 7; 85630 Grasbrunn; Germany
Tel: +49 - 89 - 45 69 11 0
Fax: +49 - 89 - 45 69 11 21
mob: +49 - 174 - 343 28 75

mail: mi...@multinet.de
web: www.multinet.de

Sitz der Gesellschaft: 85630 Grasbrunn
Registergericht: Amtsgericht München HRB 114375
Geschäftsführer: Günter Jurgeneit, Hubert Martens

---

PGP Fingerprint: F919 3919 FF12 ED5A 2801 DEA6 AA77 57A4 EDD8 979B
Skype: misch42
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Question about risk of split-brain and risk of dual-fencing

2009-12-16 Thread Dejan Muhamedagic

Hi,

On Wed, Dec 16, 2009 at 08:56:00AM +0100, Alain.Moulle wrote:
> Hi Dejan, and thanks for responses,
> yet several remarks below ...
> Alain
> > Hi,
> > > 
> > > I'm trying to clearly evaluate the risk of split brain and the risk of 
> > > dual-fencing with pacemaker/openais in
> > > the case I can't chose anything else but having only *one* network for 
> >   
> >
> > Oops.
> >
> >   
> >> > totem protocol :
> >> >  
> >> >  Let's say we have a two-nodes cluster with stonith resources :
> >> >  - if there is a problem on one node (not a network pb) :
> >> >   the other will became DC (if not yet) and fence the node 
> >> > in failure.
> >> >  - if there is a network failure between one node and the eth switch :
> >> > each node does not get any token anymore from the other 
> >> > node, but only the
> >> > DC has the right to take a decision in the cluster and 
> >> > specifically the decision to fence the
> >> > other node, so the DC node should fence the other.
> >> > The only problem I can see here is if the "not-DC" node 
> >> > declares itself as new DC before to
> >> > be fenced, and therefore will also decide to fence the other 
> >> > node, which could lead to a
> >> > dual-fencing situation.  So the fence request from the 
> >> > initial DC node should happen before the
> >> > DC Deadtime value (default 60s) to eliminate any risk of 
> >> > dual-fencing.
> >> 
> >
> > Have you ever tried this? If that indeed makes the non-DC node
> > wait with fencing, then that may help.
> >   
> No, it 's my "on paper understanding" , but I 'll try ...

OK. That deadtime may be skipped since crmd knows that the other
node is not reachable.

> >> > And if we have a more than two-nodes cluster, it seems similar for me ...
> >> 
> >
> > No, because the partition without quorum can't fence nodes. That
> > makes things simpler and more predictable.
> >   
> ... what if no-quorum-policy=ignore ?

Why would you want to set it to ignore if you have more than
two nodes?

> >> > Am I right about all this ? or did I miss something somewhere ?
> >> 
> >
> > I'm not sure if my response helps at all. You should test this
> > thoroughly. For instance, we have one bugzilla open for
> > external/ipmi where nodes did shoot each other on split brain.
> >   
> Could I have the bugzilla number ?

http://developerbugs.linux-foundation.org/show_bug.cgi?id=2071

> It's not really easy to test if we can have dual-fencing in case of 
> network failure. For example,
> I used to work with Cluster Suite for several years, with the two-nodes 
> mode, and no quorum-disk
> functionnality (it did not work fine in the begining) . In that case, 
> there is a race to
> fence between both nodes (no DC notion in CS), and RH  always  told that 
> the probability
> to have  a dual-fencing in case of heartbeat network problem is near 0 
> but not 0.

Right. It depends on the window size between fencing request reaching
the stonith plugin and the plugin actually killing a node. If the two
windows overlap, you have a problem. Obviously, the larger the window
the higher the probability.

> OK fine, but I have some big customer's sites where I have hundreds of 
> HA pairs,
> and on these sites, despite probability is near 0 , it has happened 
> several times, not many
> but several.

With which plugin? Did you file a bugzilla? Or was it with RHCS?

> So, we can't really test this dual-fencing risk, I think we 
> have to
> rely on the behavior on paper only for this specific case, and try to 
> get the configuration
> which avoids for sure dual-fencing, and also  avoids shared resources 
> mounted on both sides,

That won't happen, but if there's a high probability to have nodes shoot
each other, then that may lead to reduced availability.

Thanks,

Dejan

> that's what I'm trying to find with Pacemaker & openais.
> 
> Thanks
> Alain Moullé
> ___
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] SLES 11 cluster members won't communicate

2009-12-16 Thread justin . kinney

> > I've captured some packets using tcpdump, and indeed, I never see the
> > multicast traffic being received, only sent.  The odd thing is that 
these
> > machines respond to other multicast traffic, like pinging 224.0.0.1.
> >
> > Is there a kernel option that anyone is aware of that could be causing 
the
> > boxes to drop multicast?
> >
> > Thanks,
> > Justin

> I'd have a serious word with your network guys.

> Show them your tcpdumps and they hopefully will understand.


I think this is the issue.  It sounds like we're carrying multicast 
traffic only on vlan 1, but this switch doesn't carry that vlan.  They're 
looking into a cisco option called "MVR" that will get the multicast 
traffic to the correct switch ports.

Thanks for your help,
Justin

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Switching after reboot

2009-12-16 Thread Artur Kamiński

Andrew Beekhof pisze:
> On Wed, Dec 16, 2009 at 8:48 AM, artur.k  wrote:
>   
>> I have built a cluster with two nodes on pacemacker 1.0.4 + DRBD (8.0.14). 
>> If one machine is restarted after returning pacemacker trying to switch all 
>> services to this server. How to prevent it?
>> 
>
> Set default-resource-stickiness to something higher than 200.
> If you don't want it to move under any circumstances, set it to INFINITY.
>
> You should also use the linbit drbd agent if possible.
>
>   
>> node test-storage-1
>> node test-storage-2
>> 
>
Thanks

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] SLES 11 cluster members won't communicate

2009-12-16 Thread Darren.Mansell

Not anything to do with this is it?

https://lists.linux-foundation.org/pipermail/openais/2007-November/00947
8.html 

-Original Message-
From: linux-ha-boun...@lists.linux-ha.org
[mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of
justin.kin...@academy.com
Sent: 16 December 2009 16:00
To: General Linux-HA mailing list
Subject: Re: [Linux-HA] SLES 11 cluster members won't communicate

> > I've captured some packets using tcpdump, and indeed, I never see
the
> > multicast traffic being received, only sent.  The odd thing is that 
these
> > machines respond to other multicast traffic, like pinging 224.0.0.1.
> >
> > Is there a kernel option that anyone is aware of that could be
causing 
the
> > boxes to drop multicast?
> >
> > Thanks,
> > Justin

> I'd have a serious word with your network guys.

> Show them your tcpdumps and they hopefully will understand.


I think this is the issue.  It sounds like we're carrying multicast 
traffic only on vlan 1, but this switch doesn't carry that vlan.
They're 
looking into a cisco option called "MVR" that will get the multicast 
traffic to the correct switch ports.

Thanks for your help,
Justin

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] SLES 11 cluster members won't communicate

2009-12-16 Thread justin . kinney

> Not anything to do with this is it?

> https://lists.linux-foundation.org/pipermail/openais/2007-November/00947
> 8.html

It looks like it is an issue with Cisco catalyst switches (we are using 
catalyst 3750s).

The resolution to the problem is documented here in case anyone is 
interested:

http://www.xpresslearn.com/cisco/resolution-to-basic-multicast-problems

This issue would not present itself if both nodes were on the same switch.

Thanks,
Justin


___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Xen live migration and constraints - hb_report

2009-12-16 Thread Andrew Beekhof

On Fri, Dec 11, 2009 at 11:28 AM, Andrew Beekhof  wrote:
> On Fri, Dec 11, 2009 at 2:17 AM, infernix  wrote:
>> Are these location constraints conflicting with the order constraints? I
>> mean, the cluster shouldn't care where they [start|migrate_to], as long as
>> they [start|migrate_to] in order, one at a time (or, if possible, a
>> configurable number of parallel jobs).
>>
>> I have a hb_report attached for this last case.
>
> I'll have a look
>

So these are the ordering constraints:

db -> dbreplica
dbreplica -> core-101
core-101 -> core-200
core-200 -> sysadmin
sysadmin -> edge
edge -> base

The problem is that dbreplica and sysadmin aren't moving, so the
ordering rules they're part of have no effect.
The only ones doing anything in this case are: core-101 -> core-200,
and edge -> base.
But that still means that db, core-101, and edge can all still migrate
at the same time.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[Linux-HA] Question about openais rrp_mode

Re: [Linux-HA] Question about risk of split-brain and risk of dual-fencing

Re: [Linux-HA] Question about openais rrp_mode

Re: [Linux-HA] Switching after reboot

[Linux-HA] crm_attribute syntax

Re: [Linux-HA] Help required to develop OCF Resource Agent Script for Master-Slave script

Re: [Linux-HA] SLES 11 cluster members won't communicate

Re: [Linux-HA] SLES 11 cluster members won't communicate

Re: [Linux-HA] SLES 11 cluster members won't communicate

Re: [Linux-HA] Question about risk of split-brain and risk of dual-fencing

Re: [Linux-HA] SLES 11 cluster members won't communicate

Re: [Linux-HA] Switching after reboot

Re: [Linux-HA] SLES 11 cluster members won't communicate

Re: [Linux-HA] SLES 11 cluster members won't communicate

Re: [Linux-HA] Xen live migration and constraints - hb_report

15 matches

Site Navigation

Mail list logo

Footer information