Re: [bess] Benjamin Kaduk's Discuss on draft-ietf-bess-mvpn-fast-failover-13: (with DISCUSS and COMMENT)

Greg Mirsky Mon, 21 Dec 2020 09:34:26 -0800

Hi Ben, et al.,
I've uploaded version -14 of the draft that includes all the updates we've
discussed. Please let me know if you find this version satisfactory
addresses your DISCUSS and COMMENTS.


Regards,
Greg

A new version of I-D, draft-ietf-bess-mvpn-fast-failover-14.txt
has been successfully submitted by Greg Mirsky and posted to the
IETF repository.

Name:           draft-ietf-bess-mvpn-fast-failover
Revision:       14
Title:          Multicast VPN Fast Upstream Failover
Document date:  2020-12-21
Group:          bess
Pages:          24
URL:
https://www.ietf.org/archive/id/draft-ietf-bess-mvpn-fast-failover-14.txt
Status:
https://datatracker.ietf.org/doc/draft-ietf-bess-mvpn-fast-failover/
Htmlized:
https://datatracker.ietf.org/doc/html/draft-ietf-bess-mvpn-fast-failover
Htmlized:
https://tools.ietf.org/html/draft-ietf-bess-mvpn-fast-failover-14
Diff:
https://www.ietf.org/rfcdiff?url2=draft-ietf-bess-mvpn-fast-failover-14

Abstract:
   This document defines Multicast Virtual Private Network (VPN)
   extensions and procedures that allow fast failover for upstream
   failures by allowing downstream Provider Edges (PEs) to consider the
   status of Provider-Tunnels (P-tunnels) when selecting the upstream PE
   for a VPN multicast flow.  The fast failover is enabled by using RFC
   8562 Bidirectional Forwarding Detection (BFD) for Multipoint Networks
   and the new BGP Attribute - BFD Discriminator.  Also, the document
   introduces a new BGP Community, Standby PE, extending BGP Multicast
   VPN routing so that a C-multicast route can be advertised toward a
   Standby Upstream PE.




Please note that it may take a couple of minutes from the time of submission
until the htmlized version and diff are available at tools.ietf.org.

The IETF Secretariat



On Fri, Dec 18, 2020 at 6:34 PM Greg Mirsky <[email protected]> wrote:

> Hi Ben,
> many thanks for your kind considerations of the proposed updates. Please
> find my follow-up notes in-lined under the GIM2>> tag below.
> Attached is the diff that highlights all the updates including the ones
> that address comments by Martin, Barry, Roman, and Murray.
> I will upload the new version if you agree to the latest proposed updates.
>
> Regards,
> Greg
>
> On Thu, Dec 17, 2020 at 6:02 PM Benjamin Kaduk <[email protected]> wrote:
>
>> Hi Greg,
>>
>> Also inline (though there's not much left to say)
>>
>> On Wed, Dec 16, 2020 at 12:17:55PM -0800, Greg Mirsky wrote:
>> > Hi Ben,
>> > thank you for the review, and your detailed comments and direct
>> questions.
>> > Please find my answers and proposed updates in-lined below under the
>> GIM>>
>> > tag.
>> > Attached, please find, the diff highlighting the updates and the new
>> > working version of the draft.
>> >
>> > Regards,
>> > Greg
>> >
>> > On Mon, Dec 14, 2020 at 4:51 PM Benjamin Kaduk via Datatracker <
>> > [email protected]> wrote:
>> >
>> > > Benjamin Kaduk has entered the following ballot position for
>> > > draft-ietf-bess-mvpn-fast-failover-13: Discuss
>> > >
>> > > When responding, please keep the subject line intact and reply to all
>> > > email addresses included in the To and CC lines. (Feel free to cut
>> this
>> > > introductory paragraph, however.)
>> > >
>> > >
>> > > Please refer to
>> https://www.ietf.org/iesg/statement/discuss-criteria.html
>> > > for more information about IESG DISCUSS and COMMENT positions.
>> > >
>> > >
>> > > The document, along with other ballot positions, can be found here:
>> > > https://datatracker.ietf.org/doc/draft-ietf-bess-mvpn-fast-failover/
>> > >
>> > >
>> > >
>> > > ----------------------------------------------------------------------
>> > > DISCUSS:
>> > > ----------------------------------------------------------------------
>> > >
>> > > Let's talk about what the requirements are for consistency across PEs
>> in
>> > > the algorithm for selecting the Primary Upstream PE.  Section 4 notes
>> > > that "all the PEs of that MVPN [are required] to follow the same UMH
>> > > selection procedure", but leaves the option of non-revertive behavior
>> as
>> > > something that "MAY also be supported by an implementation", without
>> > > requirement for consistency across all PEs.  It seems to me that if
>> some
>> > > PEs use non-revertive behavior and others do not, then they will
>> > > disagree as to which PE is the Primary (or active) PE in some cases,
>> > > which seems to conflict with the initial guidance that all PEs needed
>> to
>> > > pick the same one.  Is it perhaps that the PEs need to agree on which
>> PE
>> > > is to be advertised as Primary but not necessarily to actually be
>> using
>> > > that one for traffic?  Or am I missing something?
>> > >
>> > GIM>> Thank you for pointing out this inconsistency. I agree that the
>> text
>> > needs some tightening. Below is the proposed update:
>> > OLD TEXT:
>> >    Such behavior is referred to as
>> >    "revertive" behavior and MUST be supported.  Non-revertive behavior
>> >    refers to the behavior of continuing to select the backup PE as the
>> >    UMH even after the Primary has come up.  This non-revertive behavior
>> >    MAY also be supported by an implementation and would be enabled
>> >    through some configuration.
>> > NEW TEXT:
>> > Such behavior is referred to as
>> >    "revertive" behavior and MUST be supported.  Non-revertive behavior
>> >    refers to the behavior of continuing to select the backup PE as the
>> >    UMH even after the Primary has come up.  This non-revertive behavior
>> >    MAY also be supported by an implementation and would be enabled
>> >    through some configuration.  Selection of the behavior, revertive or
>> >    non-revertive, is an operational issue, but it MUST be consistent on
>> >    all PEs in the given MVPN.
>>
>> Looks good; I'm glad this was just a simple change.
>>
>> > >
>> > >
>> > > ----------------------------------------------------------------------
>> > > COMMENT:
>> > > ----------------------------------------------------------------------
>> > >
>> > > Section 1
>> > >
>> > >    Section 3 describes local procedures allowing an egress PE (a PE
>> > >    connected to a receiver site) to take into account the status of
>> > >    P-tunnels to determine the Upstream Multicast Hop (UMH) for a given
>> > >    (C-S, C-G).  [...]
>> > >
>> > > Does it also apply to (C-*, C-G)?  (I'll just mention it once, but the
>> > > handling seems to be somewhat inconsistent throughout the document,
>> with
>> > > (C-*,C-G) getting mentioned sometimes but not always, and no pattern
>> > > obvious to me for when it is or is not included.  I think I see some
>> > > instances where (C-*, C-G) does not make sense, so it would probably
>> not
>> > > be a universal replacement.)
>> > >
>> > GIM>> Yes, it cannot be used interchangeably. We've followed the
>> notation
>> > as defined in the last paragraph of Section 3.1 RFC 6513:
>> >    ... C-group address would be a group address in a
>> >    VPN's address space.  A C-tree is a multicast distribution tree
>> >    constructed and maintained by the PIM C-instances.  A C-flow is a
>> >    stream of multicast packets with a common C-source address and a
>> >    common C-group address.  We will use the notation "(C-S,C-G)" to
>> >    identify specific C-flows.  If a particular C-tree is a shared tree
>> >    (whether unidirectional or bidirectional) rather than a source-
>> >    specific tree, we will sometimes speak of the entire set of flows
>> >    traveling that tree, identifying the set as "(C-*,C-G)".
>> > It is my understanding, that the one reference to (C-*,C-G) in the
>> draft is
>> > in Section 3 bullet B:
>> >           The S-PMSI can be advertised only after the
>> >           Upstream PE receives a C-multicast route for (C-S, C-G)/(C-*,
>> >           C-G) to be carried over the advertised S-PMSI.
>> > The reference is not an introduced text but an informational summary of
>> > Section 5.1 RFC 6513. The text preceding that reference is intended to
>> > clarify its status:
>> >   There are three options specified in Section 5.1 of [RFC6513] for a
>> >    downstream PE to select an Upstream PE.
>> > Would you suggest an additional text?
>>
>> I don't see a need for additional text.  Thanks for walking me through it
>> -- I'm pretty sure I had lost track of the fact that the §3 (and only)
>> occurrence of "(C-*, C-G)" was in a part that was supposed to be
>> summarizing RFC 6513 by the time I stumbled upon it.  With the extra
>> context it all seems clear.
>>
>> > >
>> > >    Section 5 describes a "hot leaf standby" mechanism that can be used
>> > >    to improve failover time in MVPN.  The approach combines mechanisms
>> > >    defined in Section 3 and Section 4 has similarities with the
>> solution
>> > >    described in [RFC7431] to improve failover times when PIM routing
>> is
>> > >    used in a network given some topology and metric constraints.
>> > >
>> > > nit: grammar issue around "has similarities with" (maybe needs a
>> leading
>> > > "and"?)
>> > >
>> > GIM>> Thank you. Updated to:
>> > NEW TEXT:
>> >    Section 5 describes a "hot leaf standby" mechanism that can be used
>> >    to improve failover time in MVPN.  The approach combines mechanisms
>> >    defined in Section 3 and Section 4, and has similarities with the
>> >    solution described in [RFC7431] to improve failover times when PIM
>> >    routing is used in a network given some topology and metric
>> >    constraints.
>> > >
>> > >
>> > >    VPNs.  An operator would enable these mechanisms using a method
>> > >    discussed in Section 3 in combination with the redundancy provided
>> by
>> > >    a standby PE connected to the source of the multicast flow, and it
>> is
>> > >    assumed that all PEs in the network would support these mechanisms
>> > >    for the procedures to work.  In the case that a BGP implementation
>> > >
>> > > Is it a matter of "the procedure will not work at all unless all PEs
>> in
>> > > the network support it", or "only the PEs that support it will get the
>> > > benefits of it"?  [The next sentence suggests an anwer...]
>> > >
>> > GIM>> The sentence might be too long. Yes, only PEs that support the new
>> > Standby PE community and use any of UMH monitoring methods would
>> converge
>> > faster than PEs that don't support both features. Would the re-wording
>> as
>> > below make the text clear:
>> > NEW TEXT:
>> >    An operator would enable these mechanisms using a method
>> >    discussed in Section 3 combined with the redundancy provided by a
>> >    standby PE connected to the multicast flow source.  PEs that support
>> >    these mechanisms would converge faster and thus provide a more stable
>> >    multicast service.
>>
>> Yes, that looks pretty crisp -- thanks!
>>
>> > >
>> > > Section 3
>> > >
>> > >    Section 9.1.1 of [RFC6513] are applicable when using I-PMSI
>> > >    P-tunnels.  That document is a foundation for this document, and
>> its
>> > >    processes all apply here.  Section 9.1.1 mandates the use of
>> specific
>> > >    procedures for sending intra-AS I-PMSI A-D Routes.
>> > >
>> > > (nit) the second "Section 9.1.1" is also referring to RFC 6513, not
>> this
>> > > document, which would be the default interpretation of a bare section
>> > > reference.
>> >
>> >
>> > > (not-nit) The referenced procedure seems to be about processing, not
>> > > sending, intra-AS I-PMSI A-D routes.  Am I misreading something?
>> > >
>> > GIM>> You are right. The sentence mischaracterizes Section 9.1.1 and
>> has no
>> > informational value. Removed it altogether.
>> >
>> > >
>> > > Section 3.1
>> > >
>> > >    Different factors can be considered to determine the "status" of a
>> > >    P-tunnel and are described in the following sub-sections.  The
>> > >    optional procedures described in this section also handle the case
>> > >    the downstream PEs do not all apply the same rules to define what
>> the
>> > >    status of a P-tunnel is (please see Section 6), and some of them
>> will
>> > >    produce a result that may be different for different downstream
>> PEs.
>> > >
>> > > nit: I think it's better to put a word like "where" in "the case the
>> > > downtream PEs".
>> > >
>> > GIM>> I've tried it like the following:
>> > NEW TEXT:
>> >    The
>> >    optional procedures described in this section also handle the case
>> >    when the downstream PEs do not all apply the same rules to define
>> >    what the status of a P-tunnel is (please see Section 6), and some of
>> >    them will produce a result that may be different for different
>> >    downstream PEs.
>> >
>> > >
>> > > Section 3.1.3
>> > >
>> > >    corresponding P-tunnel MUST be re-evaluated.  If the P-tunnel
>> > >    transitions from Up to Down state, the Upstream PE that is the
>> > >    ingress of the P-tunnel MUST NOT be considered a valid UMH.
>> > >
>> > > (nit?) I'm not sure how much precedent there is for using "valid" in
>> > > this context -- IIUC the previous discussion of this process referred
>> > > only to whether a PE is a candidate for being the UMH.
>> > >
>> > GIM>> I agree, "candidate" is missing here. Proposed update:
>> > NEW TEXT:
>> >    If the P-tunnel
>> >    transitions from Up to Down state, the Upstream PE that is the
>> >    ingress of the P-tunnel MUST NOT be considered as a valid candidate
>> >    UMH.
>> >
>> > >
>> > > Section 3.1.5
>> > >
>> > >    When such a procedure is used, in the context where fast
>> restoration
>> > >    mechanisms are used for the P-tunnels, a configurable timer MUST be
>> > >    set on the downstream PE to wait before updating the UMH, to let
>> the
>> > >    P-tunnel restoration mechanism to execute its actions.  An
>> > >    implementation SHOULD use three seconds as the default value for
>> this
>> > >    timer.
>> > >
>> > > How does this interact with the value of the maximum inter-packet
>> time?
>> > > Suppose that I know to expect at least one packet every ten seconds.
>> Do
>> > > I wait ten seconds after receiving the last packet and then another
>> > > three seconds, before engaging in an UMH change?
>> > >
>> > GIM>> This scenario is similar to the use of an active OAM detecting a
>> > network failure. The role of the timer to trigger an action if a certain
>> > number of packets have not arrived. Since the maximum inter-packet time
>> is
>> > known, a downstream PE has an expectation of receiving a packet within a
>> > time interval large than the maximum inter-packet interval. In practice,
>> > the timer could be set three times the maximum inter-packet interval, so
>> > that it expires if three consecutive packets were not received. In the
>> case
>> > you've described, I think that the timer must be set larger than 10
>> > seconds, probably 30+ seconds. Would you suggest any additional text
>> here?
>>
>> I think my uncertainty here is whether the 3 seconds default is intended
>> to
>> be the only waiting period or an additional waiting period after
>> determining that the tunnel is "probably down" but before updating the
>> UMH.
>> (Looking at it again now, the latter is a bit of a strained
>> interpretation.)  That said, your description here suggests that the
>> operative mechanism is to determine that the tunnel is probably down by
>> passing a threshold number of packets that were expected but did not
>> arrive, with some accomodation for jitter in the network if packets are
>> supposed to be arriving very quickly so that "just wait for 3 missed
>> packets" isn't long enough to account for (e.g.) interrupt handling on a
>> forwarder.  So the intended sentiment seems to be something like
>> "Determining that a tunnel is probably down by waiting for enough packets
>> to fail to arrive as expected is a heuristic and operational matter that
>> depends on the maximum inter-packet time.  A timeout of three seconds is a
>> generally suitable default waiting period to ascertain that the tunnel is
>> down, though other values would be needed for atypical conditions."
>
>
>> I would not complain if you kept the original text, though.
>>
> GIM2>> I much appreciate the consideration you gave to the document and
> glad to use the suggested text.
>
>>
>> > >
>> > >    In cases where this mechanism is used in conjunction with the
>> method
>> > >    described in Section 5, no prior knowledge of the rate of the
>> > >    multicast streams is required; downstream PEs can compare reception
>> > >    on the two P-tunnels to determine when one of them is down.
>> > >
>> > > This feels a little underspecified; is there a reference or more
>> > > guidance that we could give about turning a stream of received packets
>> > > on one tunnel into a maximum inter-packet time on another tunnel,
>> > > supposedly carrying the same traffic?
>> > >
>> > GIM>> I think that this text refers to 1+1 protection, i.e.,
>> Active-Active
>> > P-tunnels. In that scenario, the determination of the P-tunnel's state
>> can
>> > be done by comparing it to the reception state of the other P-tunnel in
>> the
>> > redundancy group. But there might be corner cases, like a significant
>> delay
>> > in one of P-tunnels, that may need consideration before recommending
>> this
>> > method. I don't think that it is in the scope of the document. Would
>> > appending the paragraph with "The detailed specification of this
>> mechanism
>> > is outside the scope of this document" be acceptable?
>>
>> I think my confusion stems from the first paragraph of the section seeming
>> to emphasize the dependence on the "maximum inter-packet time", but the
>> best algorithm to use in the 1+1 case seems likely to be different (i.e.,
>> one that makes use of the full knowledge of the incoming packet
>> distribution).  So my suggestion would be something more along the lines
>> of
>> "no prior knowledge of the rate or maximum inter-packet time on the
>> multicast streams is required; downstream PEs can compare actual packet
>> reception statistics on the two P-tunnels to determine when one of them is
>> down".  (Adding a "details are out of scope of the document" to that would
>> be fine, of course.)
>>
> GIM2>> Thanks again for the suggested text.
>
>>
>> > >
>> > > Section 3.1.6
>> > >
>> > >       *  one octet-long field of TLV's Type value (Section 7.3)
>> > >
>> > >       *  one octet-long field of the length of the Value field in
>> octets
>> > >
>> > >       *  variable length Value field.
>> > >
>> > >       The length of a TLV MUST be multiple of four octets.
>> > >
>> > > I assume this is the total length, not the value in the length field?
>> > >
>> > GIM>> Correct. Would the following update make it clearer?
>> > NEW TEXT:
>> >       Figure 2 presents the Optional TLV format TLV that
>> >       consists of:
>> >
>> >       *  Type - a one-octet-long field that characterizes the
>> >          interpretation of the Value field (Section 7.3)
>> >
>> >       *  Length - a one-octet-long field equal to the length of the
>> >          Value field in octets
>> >
>> >       *  Value - a variable-length field.
>> >
>> >       The length of a TLV as a whole MUST be multiple of four octets.
>>
>> Yes, that's more clear.  (But given Jeff and Alvaro's comments maybe the
>> whole thing will change anyway.)
>>
>> > >
>> > >    The BFD Discriminator attribute MUST be considered malformed if its
>> > >    length is not a non-zero multiple of four.  If the attribute
>> > >    considered malformed, the UPDATE message SHALL be handled using the
>> > >    approach of Attribute Discard per [RFC7606].
>> > >
>> > > nit: s/attribute considered/attribute is considered/
>> > >
>> > GIM>> Thank you! Updated text to:
>> > NEW TEXT:
>> >    The BFD Discriminator attribute MUST be considered malformed if its
>> >    length is not a non-zero multiple of four.  If the attribute is
>> >    deemed to be malformed, the UPDATE message SHALL be handled using the
>> >    approach of Attribute Discard per [RFC7606].
>> >
>> > >
>> > > Section 3.1.6.1
>> > >
>> > >    o  MUST periodically transmit BFD Control packets over the x-PMSI
>> > >       P-tunnel after the P-tunnel is considered established.  Note
>> that
>> > >       the methods to declare a P-tunnel has been established are
>> outside
>> > >       the scope of this specification.
>> > >
>> > > Is there a good reference for how to choose the period of
>> transmission?
>> > >
>> > GIM>> Not really. There are many factors that an operator should
>> consider
>> > when configuring the frequency of BFD packets on the MultipointHead
>> system.
>> > One of the aspects to keep in mind is that unlike p2p BFD, there's no
>> > interval negotiation phase in p2mp BFD. As a result, a tail has no
>> > influence over the interval at which the head of the p2mp BFD session
>> > transmits BFD Control messages.
>>
>> Okay.  (Yes, I do remember there was a fair bit of contention in the
>> reviews of the multipoint BFD document relating to the transmission
>> frequency and the potential to overload the network in the absence of
>> feedback ... luckily that was a case where I was able to sit back and
>> watch, and not have to be taking a position.)
>>
>> > >
>> > >    If the tracking of the P-tunnel by using a P2MP BFD session is
>> > >    enabled after the x-PMSI A-D Route has been already advertised, the
>> > >    x-PMSI A-D Route MUST be re-sent with precisely the same attributes
>> > >    as before and the BFD Discriminator attribute included.
>> > >
>> > > Pedantically, it seems like "precisely the same attributes as before"
>> > > is incompatible with adding the BFD Discriminator attribute.  Phrasing
>> > > that discusses "the only change between the previous advertisement and
>> > > the new advertisement" would not suffer from such a potential issue.
>> > > (And similarly for when the BFD Discriminator attribute is to be
>> > > removed, a couple paragraphs later.)
>> > >
>> > GIM>> Great, thank you. Applied in both cases:
>> > NEW TEXT:
>> >    If the tracking of the P-tunnel by using a P2MP BFD session is
>> >    enabled after the x-PMSI A-D Route has been already advertised, the
>> >    x-PMSI A-D Route MUST be re-sent with the only change between the
>> >    previous advertisement and the new advertisement to be the inclusion
>> >    of the BFD Discriminator attribute.
>> > and
>> >    o  x-PMSI A-D Route MUST be re-sent with the only change between the
>> >       previous advertisement and the new advertisement be the exclusion
>> >       of the BFD Discriminator attribute;
>> >
>> > >
>> > > Section 3.1.6.2
>> > >
>> > >    o  MUST use the source IP address of the BFD Control packet, the
>> > >       value of the BFD Discriminator field, and the x-PMSI Tunnel
>> > >       Identifier [RFC6514] the BFD Control packet was received to
>> > >       properly demultiplex BFD sessions.
>> > >
>> > > nit: missing word around "the BFD Control packet was received" (maybe
>> > > "received on/in"?).
>> > >
>> > GIM>> "on" seems the better option. Updated accordingly.
>> >
>> > >
>> > >    According to [RFC8562], if the downstream PE receives Down or
>> > >    AdminDown in the State field of the BFD Control packet or
>> associated
>> > >    with the BFD session Detection Timer expires, the BFD session is
>> > >
>> > > nit: "the BFD Detection Timer associated with the BFD session expires"
>> > >
>> > GIM>> Thank you for the helpful suggestion. Updated.
>> >
>> > >
>> > >    PE, while others are considered as Standby Upstream PEs.  In such a
>> > >    scenario, when the P-tunnel is considered down, the downstream PE
>> MAY
>> > >    initiate a switchover of the traffic from the Primary Upstream PE
>> to
>> > >    the Standby Upstream PE only if the Standby Upstream PE is deemed
>> > >    available.
>> > >
>> > > I'm not sure that we've defined what it means for an Upstream PE to be
>> > > deemed "available', yet.  I guess it's possible that there is not an
>> > > established P-Tunnel between the (selected) Standby Upstream PE and
>> the
>> > > donstream PE, so just using the Up/Down/not-known-to-be-Down status of
>> > > that P-tunnel is not an option...
>> > >
>> > GIM>> The wording is sloppy, agree. I think that the intention was to
>> say
>> > "deemed in the Up state". That can be determined using the p2mp BFD
>> session
>> > with Standby Upstream PE acting as its MultipointHead. The proposed
>> update
>> > is as follows:
>> > NEW TEXT:
>> >    In such a scenario, when the P-tunnel is considered
>> >    down, the downstream PE MAY initiate a switchover of the traffic from
>> >    the Primary Upstream PE to the Standby Upstream PE only if the
>> >    Standby Upstream PE is deemed to be in the Up state.  That MAY be
>> >    determined from the state of a P2MP BFD session with the Standby
>> >    Upstream PE as the MultipointHead.
>> >
>> > >
>> > >    If the downstream PE's P-tunnel is already established when the
>> > >    downstream PE receives the new x-PMSI A-D Route with BFD
>> > >    Discriminator attribute, the downstream PE MUST associate the value
>> > >    of BFD Discriminator field with the P-tunnel and follow procedures
>> > >    listed above in this section if and only if the x-PMSI A-D Route
>> was
>> > >    properly processed as per [RFC6514], and the BFD Discriminator
>> > >    attribute was validated.
>> > >
>> > > We did not discuss any validation of the BFD Discriminator attribute
>> in
>> > > §3.1.6; what procedures would this process entail?
>> > >
>> > GIM>> There's, so far, only one validation condition:
>> >
>> > The length of a TLV as a whole MUST be multiple of four octets.
>> >
>> >
>> > > Section 4
>> > >
>> > >    The procedures described below are limited to the case where the
>> site
>> > >    that contains C-S is connected to two or more PEs, though, to
>> > >    simplify the description, the case of dual-homing is described.
>> The
>> > >
>> > > I suggest giving at least some considerations to how to choose between
>> > > multiple standby Upstream PEs when there are more than one available.
>> > >
>> > GIM>> I understand your idea but that might be a whole new specification
>> > similar to the selection of UMH from the list of candidates. Perhaps
>> > stating that the selection might use known methods but the specifics are
>> > outside the scope of this document be acceptable? For example (sorry
>> for a
>> > longer quote):
>> > NEW TEXT:
>> >    The procedures described below are limited to the case where the site
>> >    that contains C-S is connected to two or more PEs, though, to
>> >    simplify the description, the case of dual-homing is described.  In
>> >    the case where more than two PEs are connected to the C-s site,
>> >    selection of the Standby PE can be performed using one of the methods
>> >    of selecting a UMH.  Details of the selection are outside the scope
>> >    of this document.  The procedures require all the PEs of that MVPN to
>> >    follow the same UMH selection procedure, as specified in [RFC6513],
>> >    whether the PE selected based on its IP address, hashing algorithm
>> >    described in section 5.1.3 of [RFC6513], or Installed UMH Route.  The
>> >    procedures assume that if a site of a given MVPN that contains C-S is
>> >    dual-homed to two PEs, then all the other sites of that MVPN would
>> >    have two unicast VPN routes (VPN-IPv4 or VPN-IPv6) to C-S, each with
>> >    its RD.
>>
>> I can understand that selecting from a list of (more than two) candidates
>> might end up being complicated; I think your text here is sufficient to
>> address my concern, by giving some indication of how the extension to the
>> general case could be performed while acknowledging that it is not fully
>> specified yet.
>>
>> > >
>> > >    procedures require all the PEs of that MVPN to follow the same UMH
>> > >    selection procedure, as specified in [RFC6513], whether the PE
>> > >    selected based on its IP address, hashing algorithm described in
>> > >    section 5.1.3 of [RFC6513], or Installed UMH Route.  The procedures
>> > >
>> > > I assume that how the PEs agree on which procedure is in use does not
>> > > involve something being advertised in-band, and is out of scope for
>> this
>> > > document.  But please say so!
>> > >
>> > GIM>> You are right, that is an operational issue and the management
>> > plane's responsibility.
>> > NEW TEXT:
>> > The procedures require all the PEs of that MVPN to
>> >    follow the same UMH selection procedure, as specified in [RFC6513],
>> >    whether the PE selected based on its IP address, the hashing
>> >    algorithm described in section 5.1.3 of [RFC6513], or Installed UMH
>> >    Route.  The consistency of the UMH selection method used among all
>> >    PEs is expected to be provided by the management plane.
>> >
>> > >
>> > >    assume that if a site of a given MVPN that contains C-S is
>> dual-homed
>> > >    to two PEs, then all the other sites of that MVPN would have two
>> > >    unicast VPN routes (VPN-IPv4 or VPN-IPv6) to C-S, each with its RD.
>> > >
>> > > nit: s/its RD/its own RD/
>> > >
>> > GIM>> Ack
>> >
>> > > Also, please confirm that the unicast routes are *to* C-S, vs *from*
>> it.
>> > >
>> > GIM>> Though it might be somewhat counterintuitive, in the context of
>> MVPN
>> > "to" is correct.
>>
>> Thanks.
>>
>> > >
>> > > Section 4.1
>> > >
>> > >    o  the NLRI is constructed as the C-multicast route with an RT that
>> > >       identifies the Primary Upstream PE, except that the RD is the
>> same
>> > >       as if the C-multicast route was built using the Standby Upstream
>> > >       PE as the UMH (it will carry the RD associated to the unicast
>> VPN
>> > >       route advertised by the Standby Upstream PE for S and a Route
>> > >       Target derived from the Standby Upstream PE's UMH route's VRF RT
>> > >       Import EC);
>> > >
>> > > This part is a bit confusing to me, since the first part says that the
>> > > RT identifies the Primary Upstream PE, but the second part says that
>> the
>> > > RT is derived from the Standy Upstream PE's [stuff].  But I'm happy to
>> > > trust you that the [stuff] makes it correct!
>> > >
>> > GIM>> Thank you for putting your trust in our collective thinking.
>> AFAIK,
>> > it works.
>> >
>> > >
>> > > Section 4.2
>> > >
>> > >       when the PE determines (the use of the particular method to
>> detect
>> > >       the failure is outside the scope of this document) that C-S is
>> not
>> > >       reachable through some other PE, the PE SHOULD install VRF PIM
>> > >
>> > > It seems like a forward reference to §4.3 might be helpful.
>> > >
>> > GIM>> Thank you for your suggestion, the reference is added in the
>> working
>> > version.
>> >
>> > >
>> > >    Section 9.3.2 of [RFC6514], describes the procedures of sending a
>> > >    Source-Active A-D Route as a result of receiving the C-multicast
>> > >    route.  These procedures MUST be followed for both the normal and
>> > >    Standby C-multicast routes.
>> > >
>> > > There is no section 9.3.2 in RFC 6514.  There is a 9.2.3 that looks
>> > > perhaps plausible, though the string "Source-Active" does not appear
>> in
>> > > it.
>> > >
>> > GIM>> Great catch, thank you! I believe that the correct section is in
>> RFC
>> > 6513, not RFC 6514. The former opens with:
>> >    The issue described in Section 9.3.1 is resolved through the use of
>> >    Source Active A-D routes.  In the remainder this section, we provide
>> >    an example of how this works, along with an informal description of
>> >    the procedures.
>> > Would you agree RFC 6513 makes sense?
>>
>> That does look to make a lot more sense than 6514 did!
>>
>> > >
>> > > Section 4.4.2
>> > >
>> > >    Source AS carried in the C-multicast route.  If the match is found,
>> > >    and the C-multicast route carries the Standby PE BGP Community,
>> then
>> > >    the ASBR MUST perform as follows:
>> > >
>> > > (I assume that there is room for local policy to modify this "MUST",
>> > > e.g., if needed to protect against some form of attack ... perhaps it
>> > > even goes without saying.)
>> > >
>> > GIM>> Indeed. Perhaps the following change makes it more accurate:
>> > NEW TEXT:
>> >    If the match is found,
>> >    and the C-multicast route carries the Standby PE BGP Community, then
>> >    the ASBR implementation that supports this specification MUST be
>> >    configurable to perform as follows:
>> >
>> >    o  if the route was received over iBGP and its LOCAL_PREF attribute
>> >       is set to zero, then it MUST be re-advertised in eBGP with a MED
>> >       attribute (MULTI_EXIT_DISC) set to the highest possible value
>> >       (0xffff)
>> >
>> >    o  if the route was received over eBGP and its MED attribute set to
>> >       0xffff, then it MUST be re-advertised in iBGP with a LOCAL_PREF
>> >       attribute set to zero
>> >
>> >    Other ASBR procedures are applied without modification and, when
>> >    applied, MAY modify the above-listed behavior.
>>
>> Works for me :)
>>
>> > >
>> > > Section 5
>> > >
>> > >    o  Upstream PEs use the "hot standby" optional behavior and thus
>> will
>> > >       forward traffic for a given multicast state as soon as they have
>> > >       whether a (primary) BGP C-multicast route or a Standby BGP
>> > >       C-multicast route for that state (or both)
>> > >
>> > > nit: the grammar is a bit weird here, after "as soon as they have";
>> I'm
>> > > not confident that I could make an accurate suggestion for a fix.
>> > >
>> > GIM>> Would with a minor update it all reads better:
>> > NEW TEXT:
>> >       Upstream PEs use the "hot standby" optional behavior and thus will
>> >       start forwarding traffic for a given multicast state after they
>> >       have whether a (primary) BGP C-multicast route or a Standby BGP
>> >       C-multicast route for that state (or both)
>>
>> I think that "whether" is not needed (though "either" might work in its
>> stead).
>>
> GIM2>> Yes, it was not the best choice. Below is the newest version:
>     o  Upstream PEs use the "hot standby" optional behavior and thus will
>       start forwarding traffic for a given multicast state after they
>       have a (primary) BGP C-multicast route or a Standby BGP
>       C-multicast route for that state (or both)
>>
>>
>> > >
>> > > Section 6
>> > >
>> > > I could almost see the discussion of duplicate packets as being a
>> > > subsection of the security considerations, though I don't mind leaving
>> > > it as-is.
>> > >
>> > GIM>> Thank you for agreeing.
>> >
>> > >
>> > > Section 8
>> > >
>> > > We could perhaps make some pro forma note that the BFD Discriminator
>> > > attribute, like all BGP attributes, typically does not benefit from
>> > > cryptographic integrity protection and thus could be spoofed so as to
>> be
>> > > different than what is actually used by the multipoint BFD head.  That
>> > > said, I'm willing to let this fall under the incorporated-by-reference
>> > > BGP security considerations.
>> > >
>> > GIM>> Thank you.
>> >
>> > >
>> > > Is it worth noting that operating in "hot" standby mode will increase
>> > > the general level of traffic on the VPN and thus susceptibility to
>> DoS?
>> > >
>> > GIM>> We use hot standby in the control plane only. That would add some
>> BGP
>> > traffic but would not as much as 1+1 protection in the data plane. I
>> think
>> > that the amount of the additional load in the VPN with the "hot standby"
>> > defined in the draft unlikely to make PEs more volnurable to DoS. What
>> do
>> > you think?
>>
>> I think I was assuming this was "hot" in the data plane as well as the
>> control plane, when I Wrote the comment.  For just the control plane, your
>> assessment seems reasonable.
>>
>> > >
>> > >    This document uses P2MP BFD, as defined in [RFC8562], which, in
>> turn,
>> > >    is based on [RFC5880].  Security considerations relevant to each
>> > >    protocol are discussed in the respective protocol specifications.
>> An
>> > >    implementation that supports this specification MUST use a
>> mechanism
>> > >    to control the maximum number of P2MP BFD sessions that can be
>> active
>> > >    at the same time.
>> > >
>> > > What is the objective that this control is designed to achieve?  I can
>> > > "control the maximum number of sessions" by asserting the maximum
>> number
>> > > to be an absurdly large value, but I don't think that would meet the
>> > > spirit of this requirement (it does meet the letter of the
>> requirement).
>> > >
>> > GIM>> Though this recommendation may look as too vague, I think it is
>> > helpful to a developer. I imagine, as we've discussed in regard to the
>> > selection of the interval between BFD Control packets, an operator will
>> > consider the overall load of BFD Control packets across all active BFD
>> > sessions. Do you think that a sentence that connects the number of p2mp
>> BFD
>> > sessions and the rate of BFD Control packets be helpful in this context?
>>
>> Or even another clause, maybe something like "to limit the overall amount
>> of capacity used by the BFD traffic".  (I think part of what triggered my
>> comment is that this is "MUST use", not "MUST provide" -- the goal of a
>> "MUST provide" is fairly obvious but "MUST use" with no specific bound
>> could be seen as make-work.)
>>
> GIM2>> More thanks for this suggestion. I think that your clause is more
> general and better expresses our concern in regard to the potential impact
> of using p2mp BFD. Below is the new text I propose:
> NEW TEXT:
>    This document uses P2MP BFD, as defined in [RFC8562], which, in turn,
>    is based on [RFC5880].  Security considerations relevant to each
>    protocol are discussed in the respective protocol specifications.  An
>    implementation that supports this specification MUST provide a
>    mechanism to limit the overall amount of capacity used by the BFD
>    traffic (as the combination of the number of active P2MP BFD sessions
>    and the rate of BFD Control packets to process).
>
>>
>> > >
>> > >    The methods described in Section 3.1 may produce false-negative
>> state
>> > >    changes that can be the trigger for an unnecessary convergence in
>> the
>> > >    control plane, ultimately negatively impacting the multicast
>> service
>> > >    provided by the VPN.  An operator is expected to consider the
>> network
>> > >    environment and use available controls of the mechanism used to
>> > >    determine the status of a P-tunnel.
>> > >
>> > > We mentioned earlier (e.g., in §3.1) that similar negative effects can
>> > > occur when resiliency mechanisms at different layers interact; that
>> > > might be worth repeating here.
>> > >
>> > GIM>> One of such references is in Section 3.1.2:
>> >    In many cases, it is not practical to use both protection
>> >    methods at the same time because uncorrelated timers might cause
>> >    unnecessary switchovers and destabilize the network.
>> > Thus we referred to Section 3.1 as the encompassing reference to all
>> > possible scenarios. Would you agree with that?
>>
>> Wow, that looks like a real short-term memory failure on my part (§3.1 was
>> mentioned just in the previous sentence!).  Sorry for the noise; this is
>> fine as-is.
>>
>> Thanks again,
>>
>> Ben
>>
>

_______________________________________________
BESS mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/bess

Re: [bess] Benjamin Kaduk's Discuss on draft-ietf-bess-mvpn-fast-failover-13: (with DISCUSS and COMMENT)

Reply via email to