Thanks for the review, I've addressed the comments and will send out another version of the patch shortly. One comment on the feedback below.
> Under "Controller Independent Active-backup", I am not sure that I buy > the argument here, because currently ovn-northd doesn't care about the > layout of the physical network. The other argument rings true for me of > course: I'm not sure I fully understood this comment. What part of the paragraph do you disagree with specifically? Whether or not ovn-northd cares about the layout of the physical network, it still needs to know which gateway is up, and under naive active-backup react to changes. Ethan > > This can significantly increase downtime in the event of a failover > as the (often already busy) ovn-northd controller has to recompute > state for the new leader. > > Here are some spelling fixes as a patch. This also replaces the fancy > Unicode U+2014 em dashes by the more common (in OVS, anyway) ASCII "--". > > Thanks again for writing this! > > diff --git a/OVN-GW-HA.md b/OVN-GW-HA.md > index ea598b2..e0d5c9f 100644 > --- a/OVN-GW-HA.md > +++ b/OVN-GW-HA.md > @@ -30,8 +30,8 @@ The OVN gateway is responsible for shuffling traffic > between logical space > implementation, the gateway is a single x86 server, or hardware VTEP. For > most > deployments, a single system has enough forwarding capacity to service the > entire virtualized network, however, it introduces a single point of failure. > -If this system dies, the entire OVN deployment becomes unavailable. To > mitgate > -this risk, an HA solution is critical — by spreading responsibilty across > +If this system dies, the entire OVN deployment becomes unavailable. To > mitigate > +this risk, an HA solution is critical -- by spreading responsibility across > multiple systems, no single server failure can take down the network. > > An HA solution is both critical to the performance and manageability of the > @@ -51,7 +51,7 @@ OVN controlled tunnel traffic, to raw physical network > traffic. > > Since the broader internet is managed outside of the OVN network domain, all > traffic between logical space and the WAN must travel through this gateway. > -This makes it a critical single point of failure — if the gateway dies, > +This makes it a critical single point of failure -- if the gateway dies, > communication with the WAN ceases for all systems in logical space. > > To mitigate this risk, multiple gateways should be run in a "High > Availability > @@ -128,15 +128,15 @@ absolute simplest way to achive this is what we'll call > "naive-active-backup". > Naive Active Backup HA Implementation > ``` > > -In a naive active-bakup, one of the Gateways is choosen (arbitrarily) as a > +In a naive active-backup, one of the Gateways is choosen (arbitrarily) as a > leader. All logical routers (A, B, C in the figure), are scheduled on this > leader gateway and all traffic flows through it. ovn-northd monitors this > gateway via OpenFlow hello messages (or some equivalent), and if the gateway > dies, it recreates the routers on one of the backups. > > This approach basically works in most cases and should likely be the starting > -point for OVN — it's strictly better than no HA solution and is a good > -foundation for more sophisticated solutions. That said, it's not without > it's > +point for OVN -- it's strictly better than no HA solution and is a good > +foundation for more sophisticated solutions. That said, it's not without its > limitations. Specifically, this approach doesn't coordinate with the physical > network to minimize disruption during failures, and it tightly couples > failover > to ovn-northd (we'll discuss why this is bad in a bit), and wastes resources > by > @@ -167,7 +167,7 @@ ethernet source address of the RARP is that of the > logical router it > corresponds to, and its destination is the broadcast address. This causes > the > RARP to travel to every L2 switch in the broadcast domain, updating > forwarding > tables accordingly. This strategy is recommended in all failover mechanisms > -discussed in this document — when a router newly boots on a new leader, it > +discussed in this document -- when a router newly boots on a new leader, it > should RARP its MAC address. > > ### Controller Independent Active-backup > @@ -188,7 +188,7 @@ Controller Independent Active-Backup Implementation > ``` > > The fundamental problem with naive active-backup, is it tightly couples the > -failover solution to ovn-northd. This can signifcantly increase downtime in > +failover solution to ovn-northd. This can significantly increase downtime in > the event of a failover as the (often already busy) ovn-northd controller has > to recompute state for the new leader. Worse, if ovn-northd goes down, we > can't perform gateway failover at all. This violates the principle that > @@ -207,7 +207,7 @@ priority to each node it controls. Nodes use the > leadership priority to > determine which gateway in the cluster is the active leader by using a simple > metric: the leader is the gateway that is healthy, with the highest priority. > If that gateway goes down, leadership falls to the next highest priority, and > -conversley, if a new gateway comes up with a higher priority, it takes over > +conversely, if a new gateway comes up with a higher priority, it takes over > leadership. > > Thus, in this model, leadership of the HA cluster is determined simply by the > @@ -221,7 +221,7 @@ of member gateways, a key problem is how do we > communicate this information to > the relevant transport nodes. Luckily, we can do this fairly cheaply using > tunnel monitoring protocols like BFD. > > -The basic idea is pretty straight forward. Each transport node maintains a > +The basic idea is pretty straightforward. Each transport node maintains a > tunnel to every gateway in the HA cluster (not just the leader). These > tunnels are monitored using the BFD protocol to see which are alive. Given > this information, hypervisors can trivially compute the highest priority live > @@ -277,7 +277,7 @@ even though its tunnels are still healthy. > Router Specific Active-Backup > ``` > Controller independent active-backup is a great advance over naive > -active-backup, but it still has one glaring problem — it under-utilizes the > +active-backup, but it still has one glaring problem -- it under-utilizes the > backup gateways. In ideal scenario, all traffic would split evenly among the > live set of gateways. Getting all the way there is somewhat tricky, but as a > step in the direction, one could use the "Router Specific Active-Backup" > @@ -286,7 +286,7 @@ router basis, with one twist. It chooses a different > active Gateway for each > logical router. Thus, in situations where there are several logical routers, > all with somewhat balanced load, this algorithm performs better. > > -Implementation of this strategy is quite straight forward if built on top of > +Implementation of this strategy is quite straightforward if built on top of > basic controller independent active-backup. On a per logical router basis, > the > algorithm is the same, leadership is determined by the liveness of the > gateways. The key difference here is that the gateways must have a different > @@ -295,7 +295,7 @@ be computed by ovn-northd just as they had been in the > controller independent > active-backup model. > > Once we have these per logical router priorities, they simply need be > -comminucated to the members of the gateway cluster and the hypervisors. The > +communicated to the members of the gateway cluster and the hypervisors. The > hypervisors in particular, need simply have an active-backup bundle action > (or > group action) per logical router listing the gateways in priority order for > *that router*, rather than having a single bundle action shared for all the > @@ -327,7 +327,7 @@ undesirable. > The controller can optionally avoid preemption by cleverly tweaking the > leadership priorities. For each router, new gateways should be assigned > priorities that put them second in line or later when they eventually come > up. > -Furthermore, if a gateway goes down for a significant period of time, it's > old > +Furthermore, if a gateway goes down for a significant period of time, its old > leadership priorities should be revoked and new ones should be assigned as if > it's a brand new gateway. Note that this should only happen if a gateway has > been down for a while (several minutes), otherwise a flapping gateway could > @@ -368,7 +368,7 @@ gateways end up implementing an overly conservative "when > in doubt drop all > traffic" policy, or they implement something like MLAG. > > MLAG has multiple gateways work together to pretend to be a single L2 switch > -with a large LACP bond. In principle, it's the right right solution to the > +with a large LACP bond. In principle, it's the right solution to the > problem as it solves the broadcast storm problem, and has been deployed > successfully in other contexts. That said, it's difficult to get right and > not > recommended. _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev