Re: [ovs-dev] [PATCH] doc: Document proposed OVN Gateway HA design.

Ethan Jackson Tue, 21 Jul 2015 11:33:09 -0700

Thanks for the review, I've addressed the comments and will send out
another version of the patch shortly.  One comment on the feedback
below.


> Under "Controller Independent Active-backup", I am not sure that I buy
> the argument here, because currently ovn-northd doesn't care about the
> layout of the physical network.  The other argument rings true for me of
> course:

I'm not sure I fully understood this comment.  What part of the
paragraph do you disagree with specifically?  Whether or not
ovn-northd cares about the layout of the physical network,  it still
needs to know which gateway is up, and under naive active-backup react
to changes.

Ethan



>
>     This can significantly increase downtime in the event of a failover
>     as the (often already busy) ovn-northd controller has to recompute
>     state for the new leader.
>
> Here are some spelling fixes as a patch.  This also replaces the fancy
> Unicode U+2014 em dashes by the more common (in OVS, anyway) ASCII "--".
>
> Thanks again for writing this!
>
> diff --git a/OVN-GW-HA.md b/OVN-GW-HA.md
> index ea598b2..e0d5c9f 100644
> --- a/OVN-GW-HA.md
> +++ b/OVN-GW-HA.md
> @@ -30,8 +30,8 @@ The OVN gateway is responsible for shuffling traffic 
> between logical space
>  implementation, the gateway is a single x86 server, or hardware VTEP.  For 
> most
>  deployments, a single system has enough forwarding capacity to service the
>  entire virtualized network, however, it introduces a single point of failure.
> -If this system dies, the entire OVN deployment becomes unavailable.  To 
> mitgate
> -this risk, an HA solution is critical — by spreading responsibilty across
> +If this system dies, the entire OVN deployment becomes unavailable.  To 
> mitigate
> +this risk, an HA solution is critical -- by spreading responsibility across
>  multiple systems, no single server failure can take down the network.
>
>  An HA solution is both critical to the performance and manageability of the
> @@ -51,7 +51,7 @@ OVN controlled tunnel traffic, to raw physical network 
> traffic.
>
>  Since the broader internet is managed outside of the OVN network domain, all
>  traffic between logical space and the WAN must travel through this gateway.
> -This makes it a critical single point of failure — if the gateway dies,
> +This makes it a critical single point of failure -- if the gateway dies,
>  communication with the WAN ceases for all systems in logical space.
>
>  To mitigate this risk, multiple gateways should be run in a "High 
> Availability
> @@ -128,15 +128,15 @@ absolute simplest way to achive this is what we'll call 
> "naive-active-backup".
>  Naive Active Backup HA Implementation
>  ```
>
> -In a naive active-bakup, one of the Gateways is choosen (arbitrarily) as a
> +In a naive active-backup, one of the Gateways is choosen (arbitrarily) as a
>  leader.  All logical routers (A, B, C in the figure), are scheduled on this
>  leader gateway and all traffic flows through it.  ovn-northd monitors this
>  gateway via OpenFlow hello messages (or some equivalent), and if the gateway
>  dies, it recreates the routers on one of the backups.
>
>  This approach basically works in most cases and should likely be the starting
> -point for OVN — it's strictly better than no HA solution and is a good
> -foundation for more sophisticated solutions.  That said, it's not without 
> it's
> +point for OVN -- it's strictly better than no HA solution and is a good
> +foundation for more sophisticated solutions.  That said, it's not without its
>  limitations. Specifically, this approach doesn't coordinate with the physical
>  network to minimize disruption during failures, and it tightly couples 
> failover
>  to ovn-northd (we'll discuss why this is bad in a bit), and wastes resources 
> by
> @@ -167,7 +167,7 @@ ethernet source address of the RARP is that of the 
> logical router it
>  corresponds to, and its destination is the broadcast address.  This causes 
> the
>  RARP to travel to every L2 switch in the broadcast domain, updating 
> forwarding
>  tables accordingly.  This strategy is recommended in all failover mechanisms
> -discussed in this document — when a router newly boots on a new leader, it
> +discussed in this document -- when a router newly boots on a new leader, it
>  should RARP its MAC address.
>
>  ### Controller Independent Active-backup
> @@ -188,7 +188,7 @@ Controller Independent Active-Backup Implementation
>  ```
>
>  The fundamental problem with naive active-backup, is it tightly couples the
> -failover solution to ovn-northd.  This can signifcantly increase downtime in
> +failover solution to ovn-northd.  This can significantly increase downtime in
>  the event of a failover as the (often already busy) ovn-northd controller has
>  to recompute state for the new leader. Worse, if ovn-northd goes down, we
>  can't perform gateway failover at all.  This violates the principle that
> @@ -207,7 +207,7 @@ priority to each node it controls.  Nodes use the 
> leadership priority to
>  determine which gateway in the cluster is the active leader by using a simple
>  metric: the leader is the gateway that is healthy, with the highest priority.
>  If that gateway goes down, leadership falls to the next highest priority, and
> -conversley, if a new gateway comes up with a higher priority, it takes over
> +conversely, if a new gateway comes up with a higher priority, it takes over
>  leadership.
>
>  Thus, in this model, leadership of the HA cluster is determined simply by the
> @@ -221,7 +221,7 @@ of member gateways, a key problem is how do we 
> communicate this information to
>  the relevant transport nodes.  Luckily, we can do this fairly cheaply using
>  tunnel monitoring protocols like BFD.
>
> -The basic idea is pretty straight forward.  Each transport node maintains a
> +The basic idea is pretty straightforward.  Each transport node maintains a
>  tunnel to every gateway in the HA cluster (not just the leader).  These
>  tunnels are monitored using the BFD protocol to see which are alive.  Given
>  this information, hypervisors can trivially compute the highest priority live
> @@ -277,7 +277,7 @@ even though its tunnels are still healthy.
>  Router Specific Active-Backup
>  ```
>  Controller independent active-backup is a great advance over naive
> -active-backup, but it still has one glaring problem — it under-utilizes the
> +active-backup, but it still has one glaring problem -- it under-utilizes the
>  backup gateways.  In ideal scenario, all traffic would split evenly among the
>  live set of gateways.  Getting all the way there is somewhat tricky, but as a
>  step in the direction, one could use the "Router Specific Active-Backup"
> @@ -286,7 +286,7 @@ router basis, with one twist.  It chooses a different 
> active Gateway for each
>  logical router.  Thus, in situations where there are several logical routers,
>  all with somewhat balanced load, this algorithm performs better.
>
> -Implementation of this strategy is quite straight forward if built on top of
> +Implementation of this strategy is quite straightforward if built on top of
>  basic controller independent active-backup.  On a per logical router basis, 
> the
>  algorithm is the same, leadership is determined by the liveness of the
>  gateways.  The key difference here is that the gateways must have a different
> @@ -295,7 +295,7 @@ be computed by ovn-northd just as they had been in the 
> controller independent
>  active-backup model.
>
>  Once we have these per logical router priorities, they simply need be
> -comminucated to the members of the gateway cluster and the hypervisors.  The
> +communicated to the members of the gateway cluster and the hypervisors.  The
>  hypervisors in particular, need simply have an active-backup bundle action 
> (or
>  group action) per logical router listing the gateways in priority order for
>  *that router*, rather than having a single bundle action shared for all the
> @@ -327,7 +327,7 @@ undesirable.
>  The controller can optionally avoid preemption by cleverly tweaking the
>  leadership priorities.  For each router, new gateways should be assigned
>  priorities that put them second in line or later when they eventually come 
> up.
> -Furthermore, if a gateway goes down for a significant period of time, it's 
> old
> +Furthermore, if a gateway goes down for a significant period of time, its old
>  leadership priorities should be revoked and new ones should be assigned as if
>  it's a brand new gateway.  Note that this should only happen if a gateway has
>  been down for a while (several minutes), otherwise a flapping gateway could
> @@ -368,7 +368,7 @@ gateways end up implementing an overly conservative "when 
> in doubt drop all
>  traffic" policy, or they implement something like MLAG.
>
>  MLAG has multiple gateways work together to pretend to be a single L2 switch
> -with a large LACP bond.  In principle, it's the right right solution to the
> +with a large LACP bond.  In principle, it's the right solution to the
>  problem as it solves the broadcast storm problem, and has been deployed
>  successfully in other contexts.  That said, it's difficult to get right and 
> not
>  recommended.
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH] doc: Document proposed OVN Gateway HA design.

Reply via email to