[Linux-HA] Question about openais rrp_mode
Hi, > rrp_mode > This specifies the mode of redundant ring, which may > be none, > active, or passive. Active replication offers > slightly lower > latency from transmit to delivery in faulty network > environ- > ments but with less performance. Passive > replication may > nearly double the speed of the totem protocol if the > protocol > doesn't become cpu bound. Not completely clear for me: does that mean that "active mode" makes it send the totems systematically on both networks, and "passive mode" makes it send on the first interface ringnumber (in openais.conf) and only on the second interface rignnumber if the first is broken ? Could someone give more precise information ? or where can I find more information about this ? And by the way, is there any issue to use to set the first interface ringnumber on Ethernet (eth0) and the second on IP/Infiniband ? Thanks for your response. Alain Moullé ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Question about risk of split-brain and risk of dual-fencing
On Wed, Dec 16, 2009 at 8:56 AM, Alain.Moulle wrote: >> No. It is a split-brain situation as soon as nodes can't >> communicate. >> > Ok, you're rigtht, in fact, I wanted to talk about the risk of shared > resources mounted on both sides, which > is in fact the worst thing that could happen in case of "split-brain" if > no fencing occurs . Thats why most vendors will not support configurations without fencing configured. If you care about your data, you need fencing. >> >>> > And if we have a more than two-nodes cluster, it seems similar for me ... >>> >> >> No, because the partition without quorum can't fence nodes. That >> makes things simpler and more predictable. >> > ... what if no-quorum-policy=ignore ? Then you get what you ask for :-) [snip] > try to > get the configuration > which avoids for sure dual-fencing, and also avoids shared resources > mounted on both sides, > that's what I'm trying to find with Pacemaker & openais. I would recommend one of two approaches. Either have stonith use the poweroff method or don't start the cluster software automatically when the node boots. Also, have a read of Tim's stonith doc: http://ourobengr.com/ha ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Question about openais rrp_mode
perhaps try the openais mailing list rather than their competitor ;-) On Wed, Dec 16, 2009 at 9:18 AM, Alain.Moulle wrote: > Hi, >> rrp_mode >> This specifies the mode of redundant ring, which may >> be none, >> active, or passive. Active replication offers >> slightly lower >> latency from transmit to delivery in faulty network >> environ- >> ments but with less performance. Passive >> replication may >> nearly double the speed of the totem protocol if the >> protocol >> doesn't become cpu bound. > Not completely clear for me: does that mean that "active mode" makes it > send the totems systematically on both networks, and "passive mode" makes > it send on the first interface ringnumber (in openais.conf) and only on > the second interface rignnumber if the first is broken ? > Could someone give more precise information ? > or where can I find more information about this ? > > And by the way, is there any issue to use to set the first interface > ringnumber > on Ethernet (eth0) and the second on IP/Infiniband ? > > Thanks for your response. > Alain Moullé > > > ___ > Linux-HA mailing list > Linux-HA@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Switching after reboot
On Wed, Dec 16, 2009 at 8:48 AM, artur.k wrote: > I have built a cluster with two nodes on pacemacker 1.0.4 + DRBD (8.0.14). If > one machine is restarted after returning pacemacker trying to switch all > services to this server. How to prevent it? Set default-resource-stickiness to something higher than 200. If you don't want it to move under any circumstances, set it to INFINITY. You should also use the linbit drbd agent if possible. > > node test-storage-1 > node test-storage-2 > primitive drbd0 ocf:heartbeat:drbd \ > params drbd_resource="r0" \ > op monitor interval="59s" role="Master" timeout="30s" \ > op monitor interval="60s" role="Slave" timeout="30s" \ > op start interval="0" timeout="20s" \ > op stop interval="0" timeout="20s" > primitive fs0 ocf:heartbeat:Filesystem \ > params fstype="xfs" directory="/mnt/drbd0" device="/dev/drbd0" \ > params options="rw,nosuid,noatime" \ > op monitor interval="21s" timeout="20s" \ > op start interval="0" timeout="60s" \ > op stop interval="0" timeout="20s" > primitive ip ocf:heartbeat:IPaddr2 \ > params ip="10.1.x.x" nic="eth1" cidr_netmask="24" \ > op monitor interval="21s" timeout="5s" > primitive nfs-common lsb:nfs-common \ > op monitor interval="21s" timeout="5s" > primitive nfs-kernel-server lsb:nfs-kernel-server \ > op monitor interval="21s" timeout="5s" \ > op start interval="0" timeout="180s" > group storage fs0 nfs-kernel-server ip nfs-common > ms ms-drbd0 drbd0 \ > meta clone-max="2" notify="true" globally-unique="false" > target-role="Started" > colocation storage-on-ms-drbd0 inf: storage ms-drbd0:Master > order ms-drbd0-before-storage inf: ms-drbd0:promote storage:start > property $id="cib-bootstrap-options" \ > dc-version="1.0.4-2ec1d189f9c23093bf9239a980534b661baf782d" \ > cluster-infrastructure="openais" \ > expected-quorum-votes="2" \ > no-quorum-policy="ignore" \ > stonith-enabled="false" \ > default-resource-stickiness="200" > > ___ > Linux-HA mailing list > Linux-HA@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] crm_attribute syntax
-- 在线游戏技术部网管组 李森(Jason) POPO :listen1...@163.com Email:li...@corp.netease.com ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Help required to develop OCF Resource Agent Script for Master-Slave script
Andrew Beekhof-3 wrote: > > On Wed, Dec 2, 2009 at 5:23 AM, Jessy wrote: >>> Yes, but did you add a monitor action to the resource's definition in >>> the configuration? >>> >>> [Jessy] : I have added monitor operation defination in cib.xml with >>> certain interval time in cib.xml file as below. >>> >> role="Master"/> >>> >> role="Slave"/> >>> >>> More over, i've also added the defination of monitor action in the >>> meta-data of RA 'MaSlApp' as follows: >>> >>> >>> >>> >>> >>> >>> >> start-delay="50" role="Slave"/> >>> >> start-delay="30" role="Master"/> >>> >>> >>> >>> >>> Thanks in advance!!! > > Ok, and what happened? > Did you also upgrade? > ___ > Linux-HA mailing list > Linux-HA@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > > -- View this message in context: http://old.nabble.com/Help-required-to-develop-OCF-Resource-Agent-Script-for-Master-Slave-script-tp26472940p26806893.html Sent from the Linux-HA mailing list archive at Nabble.com. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] SLES 11 cluster members won't communicate
Hi, On Tue, Dec 15, 2009 at 09:52:14AM -0600, justin.kin...@academy.com wrote: > Hello everyone. > > I'm configuring a new 2 node cluster using SLES11 and the HAE using > openais 0.80.3-26.1 and pacemaker 1.0.3-4.1 > > The problem I'm having is that the nodes do not seem to find each other as > the documentation says they should. > > Here's a brief rundown of what I've done: > > 1. configured the two nodes using ip addresses 10.1.254.166 and > 10.1.254.169. > 2. installed the ha_sles pattern > 3. updated the following lines in /etc/ais/openais.conf: > bindnetaddr:10.1.254.0 > mcastaddr: 239.252.10.10 > mcastport: 5405 > 4. opened udp/5405 in the firewall > 5. generated /etc/ais/authkey using ais-keygen and copied to second node > 6. start openais using rcopenais start > > Here are my questions: > > 1. How long should I expect to wait before seeing the CLM messages > indicating the nodes joining the cluster? Initially, I waited a few > minutes and assumed something was wrong because I never saw these > messages. But last night, the following appeared in the log: > > Dec 14 18:31:59 plccedir03 openais[3739]: [CLM ] Members Left: > Dec 14 18:31:59 plccedir03 openais[3739]: [CLM ] Members Joined: > Dec 14 18:31:59 plccedir03 openais[3739]: [CLM ] r(0) > ip(10.1.254.169) > Dec 14 18:31:59 plccedir03 openais[3739]: [CLM ] got nodejoin message > 10.1.254.166 > Dec 14 18:31:59 plccedir03 openais[3739]: [CLM ] got nodejoin message > 10.1.254.169 That looks good. Are you sure that a) there's really no firewall involvement and b) your network switch can handle multicast? > 2. Using the GUI, the other node never shows online. The node where > crm_gui is being run from shows online, but the other one never goes > green. > > 3. After a restart of openais this morning, I have not yet > > I've included the messages from a shutdown/startup of openais this > morning. Nothing much in the logs, except that nodes don't form a cluster. Check if they really communicate using tcpdump or wireshark. There's also openais-cfgtool which may display ring status. Thanks, Dejan > Thanks in advance, > Justin > > > > ___ > Linux-HA mailing list > Linux-HA@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] SLES 11 cluster members won't communicate
> > I'm configuring a new 2 node cluster using SLES11 and the HAE using > > openais 0.80.3-26.1 and pacemaker 1.0.3-4.1 > > > > The problem I'm having is that the nodes do not seem to find each other as > > the documentation says they should. > > > > Here's a brief rundown of what I've done: > > > > 1. configured the two nodes using ip addresses 10.1.254.166 and > > 10.1.254.169. > > 2. installed the ha_sles pattern > > 3. updated the following lines in /etc/ais/openais.conf: > > bindnetaddr:10.1.254.0 > > mcastaddr: 239.252.10.10 > > mcastport: 5405 > > 4. opened udp/5405 in the firewall > > 5. generated /etc/ais/authkey using ais-keygen and copied to second node > > 6. start openais using rcopenais start > > > > Here are my questions: > > > > 1. How long should I expect to wait before seeing the CLM messages > > indicating the nodes joining the cluster? Initially, I waited a few > > minutes and assumed something was wrong because I never saw these > > messages. But last night, the following appeared in the log: > > > > Dec 14 18:31:59 plccedir03 openais[3739]: [CLM ] Members Left: > > Dec 14 18:31:59 plccedir03 openais[3739]: [CLM ] Members Joined: > > Dec 14 18:31:59 plccedir03 openais[3739]: [CLM ] r(0) > > ip(10.1.254.169) > > Dec 14 18:31:59 plccedir03 openais[3739]: [CLM ] got nodejoin message > > 10.1.254.166 > > Dec 14 18:31:59 plccedir03 openais[3739]: [CLM ] got nodejoin message > > 10.1.254.169 > That looks good. Are you sure that a) there's really no firewall > involvement and b) your network switch can handle multicast? In my latest attempts, I've cleared all iptables rules to be sure that wasn't an issue. There is no other firewall between these boxes. I will pursue the possibility of no multicast support, although our network engineers have told me it is enabled on our switches. > > 2. Using the GUI, the other node never shows online. The node where > > crm_gui is being run from shows online, but the other one never goes > > green. > > > > 3. After a restart of openais this morning, I have not yet > > > > I've included the messages from a shutdown/startup of openais this > > morning. > Nothing much in the logs, except that nodes don't form a cluster. > Check if they really communicate using tcpdump or wireshark. > There's also openais-cfgtool which may display ring status. I've captured some packets using tcpdump, and indeed, I never see the multicast traffic being received, only sent. The odd thing is that these machines respond to other multicast traffic, like pinging 224.0.0.1. Is there a kernel option that anyone is aware of that could be causing the boxes to drop multicast? Thanks, Justin ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] SLES 11 cluster members won't communicate
Am Mittwoch, 16. Dezember 2009 14:43:33 schrieb justin.kin...@academy.com: > > > I'm configuring a new 2 node cluster using SLES11 and the HAE using > > > openais 0.80.3-26.1 and pacemaker 1.0.3-4.1 > > > > > > The problem I'm having is that the nodes do not seem to find each > > other as > > > > the documentation says they should. > > > > > > Here's a brief rundown of what I've done: > > > > > > 1. configured the two nodes using ip addresses 10.1.254.166 and > > > 10.1.254.169. > > > 2. installed the ha_sles pattern > > > 3. updated the following lines in /etc/ais/openais.conf: > > > bindnetaddr:10.1.254.0 > > > mcastaddr: 239.252.10.10 > > > mcastport: 5405 > > > 4. opened udp/5405 in the firewall > > > 5. generated /etc/ais/authkey using ais-keygen and copied to second > > node > > > > 6. start openais using rcopenais start > > > > > > Here are my questions: > > > > > > 1. How long should I expect to wait before seeing the CLM messages > > > indicating the nodes joining the cluster? Initially, I waited a few > > > minutes and assumed something was wrong because I never saw these > > > messages. But last night, the following appeared in the log: > > > > > > Dec 14 18:31:59 plccedir03 openais[3739]: [CLM ] Members Left: > > > Dec 14 18:31:59 plccedir03 openais[3739]: [CLM ] Members Joined: > > > Dec 14 18:31:59 plccedir03 openais[3739]: [CLM ] r(0) > > > ip(10.1.254.169) > > > Dec 14 18:31:59 plccedir03 openais[3739]: [CLM ] got nodejoin message > > > 10.1.254.166 > > > Dec 14 18:31:59 plccedir03 openais[3739]: [CLM ] got nodejoin message > > > 10.1.254.169 > > > > That looks good. Are you sure that a) there's really no firewall > > involvement and b) your network switch can handle multicast? > > In my latest attempts, I've cleared all iptables rules to be sure that > wasn't an issue. There is no other firewall between these boxes. > > I will pursue the possibility of no multicast support, although our > network engineers have told me it is enabled on our switches. > > > > 2. Using the GUI, the other node never shows online. The node where > > > crm_gui is being run from shows online, but the other one never goes > > > green. > > > > > > 3. After a restart of openais this morning, I have not yet > > > > > > I've included the messages from a shutdown/startup of openais this > > > morning. > > > > Nothing much in the logs, except that nodes don't form a cluster. > > Check if they really communicate using tcpdump or wireshark. > > There's also openais-cfgtool which may display ring status. > > I've captured some packets using tcpdump, and indeed, I never see the > multicast traffic being received, only sent. The odd thing is that these > machines respond to other multicast traffic, like pinging 224.0.0.1. > > Is there a kernel option that anyone is aware of that could be causing the > boxes to drop multicast? > > Thanks, > Justin I'd have a serious word with your network guys. Show them your tcpdumps and they hopefully will understand. Greetings, -- Dr. Michael Schwartzkopff MultiNET Services GmbH Addresse: Bretonischer Ring 7; 85630 Grasbrunn; Germany Tel: +49 - 89 - 45 69 11 0 Fax: +49 - 89 - 45 69 11 21 mob: +49 - 174 - 343 28 75 mail: mi...@multinet.de web: www.multinet.de Sitz der Gesellschaft: 85630 Grasbrunn Registergericht: Amtsgericht München HRB 114375 Geschäftsführer: Günter Jurgeneit, Hubert Martens --- PGP Fingerprint: F919 3919 FF12 ED5A 2801 DEA6 AA77 57A4 EDD8 979B Skype: misch42 ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Question about risk of split-brain and risk of dual-fencing
Hi, On Wed, Dec 16, 2009 at 08:56:00AM +0100, Alain.Moulle wrote: > Hi Dejan, and thanks for responses, > yet several remarks below ... > Alain > > Hi, > > > > > > I'm trying to clearly evaluate the risk of split brain and the risk of > > > dual-fencing with pacemaker/openais in > > > the case I can't chose anything else but having only *one* network for > > > > > > Oops. > > > > > >> > totem protocol : > >> > > >> > Let's say we have a two-nodes cluster with stonith resources : > >> > - if there is a problem on one node (not a network pb) : > >> > the other will became DC (if not yet) and fence the node > >> > in failure. > >> > - if there is a network failure between one node and the eth switch : > >> > each node does not get any token anymore from the other > >> > node, but only the > >> > DC has the right to take a decision in the cluster and > >> > specifically the decision to fence the > >> > other node, so the DC node should fence the other. > >> > The only problem I can see here is if the "not-DC" node > >> > declares itself as new DC before to > >> > be fenced, and therefore will also decide to fence the other > >> > node, which could lead to a > >> > dual-fencing situation. So the fence request from the > >> > initial DC node should happen before the > >> > DC Deadtime value (default 60s) to eliminate any risk of > >> > dual-fencing. > >> > > > > Have you ever tried this? If that indeed makes the non-DC node > > wait with fencing, then that may help. > > > No, it 's my "on paper understanding" , but I 'll try ... OK. That deadtime may be skipped since crmd knows that the other node is not reachable. > >> > And if we have a more than two-nodes cluster, it seems similar for me ... > >> > > > > No, because the partition without quorum can't fence nodes. That > > makes things simpler and more predictable. > > > ... what if no-quorum-policy=ignore ? Why would you want to set it to ignore if you have more than two nodes? > >> > Am I right about all this ? or did I miss something somewhere ? > >> > > > > I'm not sure if my response helps at all. You should test this > > thoroughly. For instance, we have one bugzilla open for > > external/ipmi where nodes did shoot each other on split brain. > > > Could I have the bugzilla number ? http://developerbugs.linux-foundation.org/show_bug.cgi?id=2071 > It's not really easy to test if we can have dual-fencing in case of > network failure. For example, > I used to work with Cluster Suite for several years, with the two-nodes > mode, and no quorum-disk > functionnality (it did not work fine in the begining) . In that case, > there is a race to > fence between both nodes (no DC notion in CS), and RH always told that > the probability > to have a dual-fencing in case of heartbeat network problem is near 0 > but not 0. Right. It depends on the window size between fencing request reaching the stonith plugin and the plugin actually killing a node. If the two windows overlap, you have a problem. Obviously, the larger the window the higher the probability. > OK fine, but I have some big customer's sites where I have hundreds of > HA pairs, > and on these sites, despite probability is near 0 , it has happened > several times, not many > but several. With which plugin? Did you file a bugzilla? Or was it with RHCS? > So, we can't really test this dual-fencing risk, I think we > have to > rely on the behavior on paper only for this specific case, and try to > get the configuration > which avoids for sure dual-fencing, and also avoids shared resources > mounted on both sides, That won't happen, but if there's a high probability to have nodes shoot each other, then that may lead to reduced availability. Thanks, Dejan > that's what I'm trying to find with Pacemaker & openais. > > Thanks > Alain Moullé > ___ > Linux-HA mailing list > Linux-HA@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] SLES 11 cluster members won't communicate
> > I've captured some packets using tcpdump, and indeed, I never see the > > multicast traffic being received, only sent. The odd thing is that these > > machines respond to other multicast traffic, like pinging 224.0.0.1. > > > > Is there a kernel option that anyone is aware of that could be causing the > > boxes to drop multicast? > > > > Thanks, > > Justin > I'd have a serious word with your network guys. > Show them your tcpdumps and they hopefully will understand. I think this is the issue. It sounds like we're carrying multicast traffic only on vlan 1, but this switch doesn't carry that vlan. They're looking into a cisco option called "MVR" that will get the multicast traffic to the correct switch ports. Thanks for your help, Justin ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Switching after reboot
Andrew Beekhof pisze: > On Wed, Dec 16, 2009 at 8:48 AM, artur.k wrote: > >> I have built a cluster with two nodes on pacemacker 1.0.4 + DRBD (8.0.14). >> If one machine is restarted after returning pacemacker trying to switch all >> services to this server. How to prevent it? >> > > Set default-resource-stickiness to something higher than 200. > If you don't want it to move under any circumstances, set it to INFINITY. > > You should also use the linbit drbd agent if possible. > > >> node test-storage-1 >> node test-storage-2 >> > Thanks ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] SLES 11 cluster members won't communicate
Not anything to do with this is it? https://lists.linux-foundation.org/pipermail/openais/2007-November/00947 8.html -Original Message- From: linux-ha-boun...@lists.linux-ha.org [mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of justin.kin...@academy.com Sent: 16 December 2009 16:00 To: General Linux-HA mailing list Subject: Re: [Linux-HA] SLES 11 cluster members won't communicate > > I've captured some packets using tcpdump, and indeed, I never see the > > multicast traffic being received, only sent. The odd thing is that these > > machines respond to other multicast traffic, like pinging 224.0.0.1. > > > > Is there a kernel option that anyone is aware of that could be causing the > > boxes to drop multicast? > > > > Thanks, > > Justin > I'd have a serious word with your network guys. > Show them your tcpdumps and they hopefully will understand. I think this is the issue. It sounds like we're carrying multicast traffic only on vlan 1, but this switch doesn't carry that vlan. They're looking into a cisco option called "MVR" that will get the multicast traffic to the correct switch ports. Thanks for your help, Justin ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] SLES 11 cluster members won't communicate
> Not anything to do with this is it? > https://lists.linux-foundation.org/pipermail/openais/2007-November/00947 > 8.html It looks like it is an issue with Cisco catalyst switches (we are using catalyst 3750s). The resolution to the problem is documented here in case anyone is interested: http://www.xpresslearn.com/cisco/resolution-to-basic-multicast-problems This issue would not present itself if both nodes were on the same switch. Thanks, Justin ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Xen live migration and constraints - hb_report
On Fri, Dec 11, 2009 at 11:28 AM, Andrew Beekhof wrote: > On Fri, Dec 11, 2009 at 2:17 AM, infernix wrote: >> Are these location constraints conflicting with the order constraints? I >> mean, the cluster shouldn't care where they [start|migrate_to], as long as >> they [start|migrate_to] in order, one at a time (or, if possible, a >> configurable number of parallel jobs). >> >> I have a hb_report attached for this last case. > > I'll have a look > So these are the ordering constraints: db -> dbreplica dbreplica -> core-101 core-101 -> core-200 core-200 -> sysadmin sysadmin -> edge edge -> base The problem is that dbreplica and sysadmin aren't moving, so the ordering rules they're part of have no effect. The only ones doing anything in this case are: core-101 -> core-200, and edge -> base. But that still means that db, core-101, and edge can all still migrate at the same time. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems