Re: Service Redundancy using BFD

Ankur Dubey Wed, 29 Nov 2017 04:29:44 -0800

Hi Greg,

If C and A-B are statically programmed, VRRP can be useful to indicate to node 
C which node (A or B)  is active for a given service.  But, it does not help in 
scenarios where A-B are doing dynamic routing with node C and the IPs on which 
the services are being run themselves are dynamically advertised or withdrawn 
based on need. There are existing mechanism in dynamic routing to program path 
towards active nodes and BFD can be added to protect a static/dynamic session.  
Bottomline, we are not defining how to notify C in this draft.


Failure of BFD session between A&B may happen if there is a real node failure 
or in case of a split brain as you are suggesting. Mechanisms to avoid/fix 
split brain scenarios for a set of nodes providing redundancy are beyond the 
scope of this draft. But when a split brain resolves, the methods described in 
the draft will ensure that a given service is active only on one node.

Thanks,
--Ankur

From: Greg Mirsky <gregimir...@gmail.com>
Date: Tuesday, November 28, 2017 at 2:59 PM
To: Sami Boutros <sbout...@vmware.com>
Cc: Ashesh Mishra <mishra.ash...@outlook.com>, Ankur Dubey <adu...@vmware.com>, 
"rtg-bfd@ietf.org" <rtg-bfd@ietf.org>, Reshad Rahman <rrah...@cisco.com>
Subject: Re: Service Redundancy using BFD

Hi Sami,
you've indicated that it is that one of the set of network functions (NF), A 
and B in the figure below, that provides L2/L3 services to NF C. My question 
was how C addresses the designated forwarder (DF) of the A-B set. If it uses 
virtual address that associated with the function of the DF, then NF C doesn't 
need to know the identity of the DF (similar to VRRP, isn't it). If NF C needs 
to know the identity of the DF, then it must use some means to monitor 
liveliness of A and B.
And I have to point to couple BFD related assumptions in the draft:

  *   failure of BFD session between A and B cannot be interpreted as failure 
of A or B by respective BFD peer but only as loss of continuity between the 
forwarding engines. Assumption that the failure is not of link but of a node 
may lead to duplicate DFs;
  *   using multi-hop BFD to detect node failure may produce false negative if 
failure detection is more aggressive than network convergence, e.g. network 
convergence is guaranteed within 100 ms while BFD interval is 10 ms.
Regards,
Greg

On Tue, Nov 28, 2017 at 2:39 PM, Sami Boutros 
<sbout...@vmware.com<mailto:sbout...@vmware.com>> wrote:
Hi Greg,

A can detect failures to the link to C using any mechanisms not only BFD.

The picture below is for illustration, A and B themselves can be providing 
services (L4 to L7), this could include Firewall, NAT, LoadBalancer etc..

Thanks,

Sami
From: Greg Mirsky <gregimir...@gmail.com<mailto:gregimir...@gmail.com>>
Date: Tuesday, November 28, 2017 at 2:20 PM
To: Sami Boutros <sbout...@vmware.com<mailto:sbout...@vmware.com>>
Cc: Ashesh Mishra 
<mishra.ash...@outlook.com<mailto:mishra.ash...@outlook.com>>, Ankur Dubey 
<adu...@vmware.com<mailto:adu...@vmware.com>>, 
"rtg-bfd@ietf.org<mailto:rtg-bfd@ietf.org>" 
<rtg-bfd@ietf.org<mailto:rtg-bfd@ietf.org>>, Reshad Rahman 
<rrah...@cisco.com<mailto:rrah...@cisco.com>>

Subject: Re: Service Redundancy using BFD

Hi Sami,
would C have BFD sessions to A and B respectively or it use anycast address? 
The more I look at the use case, the more I think of VRRP ;)

Regards,
Greg

On Tue, Nov 28, 2017 at 2:15 PM, Sami Boutros 
<sbout...@vmware.com<mailto:sbout...@vmware.com>> wrote:

Hi Ashesh,

The topology is more like the following:

A <—\
|         \
BFD      C
|         /
B<—/

A and B are nodes providing L2 and L3 services for C, with A/S redundancy.

A can be active and B standby, if A goes down then B start providing the 
services.

Thanks,

Sami
From: Ashesh Mishra 
<mishra.ash...@outlook.com<mailto:mishra.ash...@outlook.com>>
Date: Tuesday, November 28, 2017 at 1:45 PM

To: Sami Boutros <sbout...@vmware.com<mailto:sbout...@vmware.com>>, Ankur Dubey 
<adu...@vmware.com<mailto:adu...@vmware.com>>, 
"rtg-bfd@ietf.org<mailto:rtg-bfd@ietf.org>" 
<rtg-bfd@ietf.org<mailto:rtg-bfd@ietf.org>>
Cc: Reshad Rahman <rrah...@cisco.com<mailto:rrah...@cisco.com>>
Subject: Re: Service Redundancy using BFD

Okay. That makes sense now.

So in a scenario where you have a primary overlay service between A and B, and 
a backup overlay service between C and D, the BFD sessions in question will be 
between A and C, and B and D (so that the backup can send diag code to primary)?

A <------- primary service --------->B
|                                                           |
BFD                                                    BFD
|                                                           |
C<-------- backup service ---------->D

--
Ashesh


From: Sami Boutros <sbout...@vmware.com<mailto:sbout...@vmware.com>>
Date: Tuesday, November 28, 2017 at 4:21 PM
To: Ashesh Mishra 
<mishra.ash...@outlook.com<mailto:mishra.ash...@outlook.com>>, Ankur Dubey 
<adu...@vmware.com<mailto:adu...@vmware.com>>, 
"rtg-bfd@ietf.org<mailto:rtg-bfd@ietf.org>" 
<rtg-bfd@ietf.org<mailto:rtg-bfd@ietf.org>>
Cc: Reshad Rahman <rrah...@cisco.com<mailto:rrah...@cisco.com>>
Subject: Re: Service Redundancy using BFD

Hi Ashesh,

A service is an overlay service running on a routing node, this could be a L2 
or L3 VPN service running on set of links connected to 2 or more nodes, where 
one node is active for a service at a given point in time, and one node is 
standby.

Now, BFD is running on underlay links between the 2 nodes active and standby, 
once BFD goes down, the standby assumes that the active went down and activates 
the services that it shares with the active. On the BFD session the standby 
would signal to the old active when it came back up that it activated the 
non-preemptive services via this diag code saying that it didn’t fail, so the 
old active node doesn’t activate those non-preemptive services.

Thanks,

Sami
From: Ashesh Mishra 
<mishra.ash...@outlook.com<mailto:mishra.ash...@outlook.com>>
Date: Tuesday, November 28, 2017 at 1:14 PM
To: Sami Boutros <sbout...@vmware.com<mailto:sbout...@vmware.com>>, Ankur Dubey 
<adu...@vmware.com<mailto:adu...@vmware.com>>, 
"rtg-bfd@ietf.org<mailto:rtg-bfd@ietf.org>" 
<rtg-bfd@ietf.org<mailto:rtg-bfd@ietf.org>>
Cc: Reshad Rahman <rrah...@cisco.com<mailto:rrah...@cisco.com>>
Subject: Re: Service Redundancy using BFD

Thanks for the response, Sami. I think our disconnect lies in the definition of 
a service. From a BFD perspective, I expect the service to be established 
across two nodes, at the very least, so that BFD can monitor its liveness. Can 
you elaborate on


-          What, in the context of this draft, a service is?

-          How does BFD signal for a service that it is not monitoring the 
liveness for?

Thanks,
Ashesh

From: Sami Boutros <sbout...@vmware.com<mailto:sbout...@vmware.com>>
Date: Tuesday, November 28, 2017 at 1:23 PM
To: Ashesh Mishra 
<mishra.ash...@outlook.com<mailto:mishra.ash...@outlook.com>>, Ankur Dubey 
<adu...@vmware.com<mailto:adu...@vmware.com>>, 
"rtg-bfd@ietf.org<mailto:rtg-bfd@ietf.org>" 
<rtg-bfd@ietf.org<mailto:rtg-bfd@ietf.org>>
Cc: Reshad Rahman <rrah...@cisco.com<mailto:rrah...@cisco.com>>
Subject: Re: Service Redundancy using BFD

Hi Ashesh,

Thanks for your comments.

For your first comment the draft applies to both single hop or what you call 
interface BFD and multi hop BFD too. And yes the per service could be per 
interface too if this is a single hop BFD, we can clarify that in the draft.

For your second comment, I am not sure I understand. The service will be active 
only on one node, if the service is associated with the whole node, then the 
BFD session is monitoring the node liveness. And when the service is associated 
with an interface the BFD session will monitor the interface connectivity as 
well. So, a primary service can’t be active at the 2 node endpoints hosting the 
BFD session.

Thanks,

Sami

Re: Service Redundancy using BFD

Reply via email to