Ketan Talaulikar has entered the following ballot position for
draft-ietf-bess-evpn-unequal-lb-33: Discuss

When responding, please keep the subject line intact and reply to all
email addresses included in the To and CC lines. (Feel free to cut this
introductory paragraph, however.)


Please refer to 
https://www.ietf.org/about/groups/iesg/statements/handling-ballot-positions/ 
for more information about how to handle DISCUSS and COMMENT positions.


The document, along with other ballot positions, can be found here:
https://datatracker.ietf.org/doc/draft-ietf-bess-evpn-unequal-lb/



----------------------------------------------------------------------
DISCUSS:
----------------------------------------------------------------------

Thanks to the WG and the authors for their work on this document that brings in
weighted load-balancing in EVPN networks.

I have a few points that I would like to discuss.

661        *  Each egress PE MUST advertise an EVPN Link Bandwidth Extended
662           Community along with the ES route to signal the PE–CE link
663           bandwidth associated with the ES.

<discuss-1> What if one of the ePEs does not send this EC or if it is invalid?
What does the receiver do? Is the BW Capability ignored and everything falls
back to the default DF election algorithm?

674        As a result, a given PE MAY appear multiple times in the DF candidate
675        list.  Consequently, the value N used in the (V mod N) operation
676        defined in [RFC7432] MUST be interpreted as the total number of
677        ordinals in the weighted candidate list, rather than the total number
678        of distinct egress PEs in the ES.

<discuss-2> Since the default DF election is being modified, would this document
also not update RFC7432? I am thinking that this document is tagged as "updates"
RFC7432, RFC8584, RFC9785 (but also
draft-ietf-bess-evpn-per-mcast-flow-df-election?) or none of those. If this is
considered an "extension" or "enhancement" of the DF election rather than a
"bugfix", then the "updates" tag is not necessary IMHO. Please see in my
comments on existing text in section 6.3 that gives me the impression that this
is an extension. My point is unnecessary "updates" tags on RFCs make it harder
for implementers/operators/readers to differentiate real "fixes" from
"enhancements/extensions". I am seeking consistency here and will leave the
options for the authors/WG to consider.

840     7.1.  Real-time Available Bandwidth

842        PE-CE link bandwidth availability may sometimes vary in real-time
843        disproportionately across PE-CE links within a multi-homed ES due to
844        various factors such as flow based hashing combined with fat flows
845        and unbalanced hashing.  Reacting to real-time available bandwidth is
846        at this time outside the scope of this document.

848        Operators SHOULD be aware, however, that too frequent or dynamic re-
849        adjustment of advertised bandwidth values may lead to instability due
850        to repeated weighted path-list recomputation and DF election changes.
851        Appropriate guards, such as dampening or hysteresis mechanisms,
852        SHOULD be considered when dynamic bandwidth advertisement is used.

<discuss-3> Upto this point the document talked about link bandwidth and not
available/free bandwidth. This section is giving the impression that the value
signaled could be something other than the fixed link bandwidth (i.e., fixed
besides scenarios where LAG members go up/down). Why does the document not say
that the values signaled MUST NOT be something that is varying based on the
link usage as doing that would be very problematic. It is not sufficient to
say that this is outside the scope. Then the first "SHOULD" is actually a
"should". And the second SHOULD can give impression that this is dynamic
when it is really not the case except in the situation of LAG members going
up/down. On the whole, this entire section is problematic from the sense of
routing stability. Likely I am misunderstanding the intent and, if so, please
clarify.

854     7.2.  Weighted Load-balancing to Multi-homed Subnets

856        EVPN Link bandwidth extended community may also be used to achieve
857        unequal load-balancing of prefix routed traffic by including this
858        extended community in EVPN Route Type 5.  When included in EVPN RT-5,
859        its value is to be interpreted as egress PE's relative weight for the
860        prefix included in this RT-5.  Ingress PE will then compute the
861        forwarding path-list for the prefix route using weighted paths
862        received from each egress PE.  EVPN Link bandwidth extended community
863        MUST be encoded with "Value-Units = 0x01" to signal a generalized
864        weight associated with the advertising PE.

<discuss-4> The MUST here is not clear to me. Is the intent that for RT5 only
the Value-Units =1 MUST be used? If so, why? Also, why is it burried down here
instead of being called out promimently in section 4.1? Or is it that if
weights are used then the Value-Units MUST be 1. If so, isn't this covered in
section 4 already. Am I missing something?

890     7.5.  EVPN Link Bandwidth Extended Community in Non-EVPN Networks

892        While this document does not preclude future applicability to non-
893        EVPN networks, it considers usage and handling of EVPN Link Bandwidth
894        Extended Community specified in this document with non-EVPN routes
895        out of scope.

<discuss-5> I would like to discuss why the use of an EVPN EC is being left
"open" for other BGP address families. That too when there is a generic Link
Bandwidth EC in BGP that already exists to provide similar functionality.
Should this document not explicitly limit the EVPN Link Bandwidth EC to EVPN
only? If so, this needs to be clarified upfront where the EC is defined and
this section can then be removed.

897     7.6.  Preference for EVPN Link Bandwidth in EVPN Networks

899        It is possible that a non-EVPN Link Bandwidth extended community such
900        as [BGP-LINK-BW] is leaked from an IP or IPVPN route into an EVPN
901        RT-5 towards an EVPN network.  If an EVPN PE receives an EVPN route
902        with both the EVPN Link Bandwidth extended community specified in
903        this document and a non-EVPN Link Bandwidth extended community such
904        as the one specified in [BGP-LINK-BW], it MUST as default behavior,
905        prefer the EVPN Link Bandwidth extended community and handle it as
906        per procedures specified in this document.  In other words, any non-
907        EVPN Link Bandwidth extended community is to be ignored if an EVPN
908        route is received with the EVPN Link Bandwidth extended community
909        specified in this document.

<discuss-6> What if some routes to a destination have both and some have only
the Link Bandwidth EC? Would a mix of the two ECs for different paths for the
same destination route be acceptable?

914     7.7.  Interworking with Non-EVPN networks

916        In EVPN routing interworking use cases with IPVPN and IPv4/IPv6
917        routing, it is not beneficial to preserve the EVPN Link Bandwidth
918        extended community from EVPN routes to non-EVPN routes as the next-
919        hop is rewritten when a prefix learnt via EVPN RT-5 is advertised
920        into IPVPN or IP routing networks.  Interworking procedures,
921        including preservation, cummulation or translation of EVPN Link
922        Bandwidth extended community to address current or future use cases
923        are however considered beyond the scope of this document.  Readers
924        are encouraged to refer to [EVPN-IPVPN] for interworking
925        specification.

<discuss-7> There is no discussion in draft-ietf-bess-evpn-ipvpn-interworking
that is related to handling of this EC propagation. On the contrary, that draft
explicitly prohibits the propagation of all EVPN-specific ECs. I agree with
what is specified by the interworking document and I wonder why this document
is not normatively prohibit propagation of EVPN-specific Link Bandwidth EC into
any other address-family. Also, I would have expected that this specification
instead cover how the conversion is done between this and the
BGP Link Bandwith ECs - if not in this document then where else does the WG plan
to do it? Having introduced two ECs for practically the same thing (and I am not
debating how we got to this stage), isn't the onus on this document to cover
this aspect? Then, about the cumulation aspect as the NH changes across the
gateway PE but also for inter-AS option B, the document says out of scope. But
where else would the WG cover that? Now, there is also the
draft-ietf-bess-ebgp-dmz that covers this aspect but for BGP LBW EC. Can that
also cover for the EVPN LBW EC?

976     10.  IANA Considerations

978     10.1.  Bandwidth Weighted DF Election Capability

980        [RFC8584] defines a new extended community for PEs within a
981        redundancy group to signal and agree on uniform DF Election Type and
982        Capabilities for each ES.  This document requests IANA to allocate a
983        bit in the "DF Election capabilities" registry setup by [RFC8584]
984        with the following suggested bit number:

986        Bit 4: BW (Bandwidth Weighted DF Election)

<discuss-8> The first sentence is not suitable for IANA considerations (as
suggested in my comments, please move into section 6.1). The registry group is
not specified here (but also in section 10.2) and it would be the BGP Extended
Communities registry group.


----------------------------------------------------------------------
COMMENT:
----------------------------------------------------------------------

Please also find below some comments inline in the idnits o/p of v33 of this
document. Please look for the tag <EoRv33> at the end to ensure you have
received the full review.

2       BESS WorkGroup                                          N. Malhotra, Ed.
3       Internet-Draft                                                A. Sajassi
4       Updates: RFC8584 (if approved)                             Cisco Systems

<minor> Please put only "8584" instead of "RFC8584" in the "updates" tag

35         homing PE set.  The document updates RFC 8584 to enable weighted load

<minor> s/RFC 8584/RFC8584

142        distributed across all egress PEs.  However, this assumption can be
143        restrictive in operational environments, particularly when adding or
144        removing member links in a multi-homed Link Aggregation Group (LAG),
145        and can be violated in the presence of individual PE–CE link
146        failures.

<minor> Perhaps an informational reference to IEEE_802.1AX_2014 is required for
LAG?

284        respective access bandwidths.  Specifically, the fraction of unicast
285        and Broadcast, Unknown Unicast, and Multicast (BUM) traffic serviced
286        by egress PEx SHOULD be:

288        Lx / (L1 + L2 + ... + Ln)

<major> This is an example and I don't thing the use of normative SHOULD is
appropriate here. Also, this is covered normatively in section 5.2 as well? So
perhaps consider:

Specifically, the fraction of unicast and Broadcast, Unknown Unicast, and
Multicast (BUM) traffic serviced by egress PEx is:

291        connected to a multi-homed Ethernet Segment.  However, the
292        requirement described in this section is not limited to physical
293        Ethernet Segments.  It equally applies to virtual Ethernet Segments
294        (vES) and to multi-homed subnets advertised using EVPN IP Prefix
295        routes.

<major> Please add normative reference to RFC9136. I am also wondering if this
document is changing/updating anything in RFC9136? Likely not, but just
checking.

319        *  ES: Ethernet Segment

321        *  ESI: Ethernet Segment ID

323        *  vES: Virtual Ethernet Segment

325        *  EVI: Ethernet virtual Instance, this is a mac-vrf

<major> Please provide references (in most cases RFC7432?) for all the EVPN
terms above (as also IMET, DF, etc. below).

331        *  RT-5: EVPN Route Type 5 as defined in [RFC7432]

<major> RT5 is in RFC9136

381     4.  EVPN Link Bandwidth Extended Community

383        This document defines a new EVPN Link Bandwidth Extended Community to
384        support the solution described herein.

<major> Even if it is obvious, I was not able to find an RFC that restricts the
use of the EVPN EC sub-types only for EVPN address-family. Could you please add
a suitable reference that says that and if not then state that this type is
specific to the EVPN AFI/SAFI and not applicable to others? This is related to
one of the discuss points.

389        *  IANA has assigned Sub-Type value 0x10 for the EVPN Link Bandwidth
390           Extended Community.

<minor> Perhaps?
The Sub-Type value 0x10 is allocated for the EVPN Link Bandwidth Extended
Community.

433        of Mbps.  Support for generalized weight values is OPTIONAL.  No
434        other Value-Units code points are defined at this time.

<minor> Can we please add a reference to section 10.3 in the last sentence so
the reader becomes aware of the registry being created?

441        *  Value-Units Consistency: When an EVPN Link Bandwidth Extended
442           Community is received with a route, a PE MUST verify that the
443           Value-Units field is consistent across all paths associated with

<minor> Is consistent or more specifically, is identical? There are other
use of the word consistent/consistently when it perhaps means "equal" or "same".

452        *  Multiplicity: A PE MUST ensure that at most one instance of the
453           EVPN Link Bandwidth Extended Community is received per path.  If
454           more than one instance is present, the extended community MUST be
455           ignored for all paths associated with the route.

<major> How can a PE ensure what it receives? It can only ensure what its
sends. Can you please rephrase the first sentence so the MUST applies to the
sender? And then the MUST in the second sentence applies is rephrased so it
applies to the receiver.

482        *  Unexpected Route Types: This document specifies the use of the
483           EVPN Link Bandwidth Extended Community only with per-ES RT-1, RT-
484           4, and RT-5 routes.  If the extended community is received with
485           any other EVPN route type, including per-[ES, EVI] RT-1 or RT-2
486           routes, it MUST be ignored, and a syslog message [RFC5424] SHOULD
487           be generated indicating the reason.

<major> Why not leave the possibility of future route types being able to
explicitly specify and use this EC?

497        generalized weight.  New EVPN link bandwidth extended community
498        defined in this document is used for this purpose.

<minor> s/New EVPN/The EVPN

504     5.2.  Ingress PE Behavior

506        An ingress PE MUST ensure that the EVPN Link Bandwidth Extended
507        Community is received from all egress PEs associated with a given ES,
508        and MUST verify that the received Value-Units are consistent across
509        all such egress PEs.  If the EVPN Link Bandwidth Extended Community
510        is missing from one or more egress PEs, or if inconsistent Value-
511        Units are detected, the ingress PE MUST ignore the EVPN Link
512        Bandwidth Extended Community for that ES and MUST revert to regular
513        ECMP forwarding toward that ES.  When the EVPN Link Bandwidth
514        Extended Community is ignored, the ingress PE SHOULD generate a
515        syslog [RFC5424] notification.

<major> Please remove the entire paragraph above since it is duplicate of the
text in section 4.1.1

521        for the ES.  These normalized weights SHOULD then be used to
522        construct a weighted forwarding path-list for load balancing, instead
523        of using an ECMP-based path-list.  The computation of egress PE

<minor> Perhaps s/an ECMP-based path-list/an equal weighted path-list ?

544        For a MAC+IP Advertisement Route (EVPN Route Type 2) received for ES-
545        y, the ingress PE MAY compute a MAC and IP forwarding path-list

<major> s/the ingress PE MAY compute/the ingress PE computes ... the normative
part is already stated previously in this section. The MAY is conflicting with
the previous SHOULD.

568        For a remote MAC+IP host route associated with ES-10, the resulting
569        forwarding path-list MAY therefore be computed as:

<major> s/path-list MAY therefore be computed/path-list is, therefore, computed

581        The above computation algorithm is provided for illustration only.
582        Weighted path-list computation based on the EVPN Link Bandwidth
583        Extended Community is a local implementation choice.  If the received

<major> Please remove the above sentence since it has already been stated
previously in the same section.

584        bandwidth values do not yield a suitable HCF that allows programming
585        reasonable integer weights in hardware, an implementation MAY apply
586        alternative approximation or rounding methods to derive implementable
587        weight values.

<minor> The above sentence is better placed right after the previous text in
this section about how weights are determined and that they are local
implementation matters.

589        Weighted path-list computation MUST be performed for an ES only if
590        the EVPN Link Bandwidth Extended Community is received from all
591        egress PEs advertising reachability to that ES via Ethernet A-D per-
592        ES Route Type 1.  If the EVPN Link Bandwidth Extended Community is
593        not received from one or more such egress PEs, the ingress PE MUST
594        compute the forwarding path-list using regular ECMP semantics.  A
595        default weight MUST NOT be assumed for an egress PE that does not
596        advertise link bandwidth, as the computed weights are strictly
597        relative.

<major> The 2nd last statement is yet another repitition. The last sentence is
new - please consider putting it either in sections 4.1.1 or where weights are
discussed earlier in this section.

599        If a per-ES Route Type 1 is not advertised, or is withdrawn, by an
600        egress PE as specified in [RFC7432], that egress PE MUST be removed
601        from the forwarding path-list for the corresponding [EVI, ES], and
602        the weighted path-list MUST be recomputed accordingly.

604        If a per-[ES, EVI] Route Type 1 is not advertised by an egress PE as
605        specified in [RFC7432], that egress PE MUST NOT be included in the
606        forwarding path-list for the corresponding [EVI, ES].  In this case,
607        the weighted path-list MUST be computed using only the weights
608        received from egress PEs that advertised the per-[ES, EVI] Route Type
609        1.

<major> The first sentences in the above 2 paragraphs are restating in a
normative manner something that was already specified in RFC7432. This is
wrong. Perhaps the intention here was to offer a reminder to the reader, and if
so, please rephrase accordingly. Then the last sentences are obvious, but
perhaps can be stated more generically that any change in the path-list results
in the recomputation of the ratios of weights for each existing path (or
something like that?).

622     6.1.  The BW Capability in the DF Election Extended Community

624        This document requests IANA to allocate a new bit in the DF Election
625        Capabilities registry defined by [RFC8584]:

<major> Please make requests to IANA only in IANA consideration sections. This
is already done in section 10.1 so the above sentence needs to be rephrased with
a TBD bit value. Later it will get replaced by the actual one upon RFC
publication. Further, the following sentence that is in 10.1 has no place in
IANA considerations and is better moved as the first sentence in this section.

"[RFC8584] defines a new extended community for PEs within a redundancy group to
signal and agree on uniform DF Election Type and Capabilities for each ES."

639        The BW Capability MAY be advertised with the following DF Types:

641        *  Type 0: Default DF Election algorithm, as specified in [RFC7432]

643        *  Type 1: Highest Random Weight (HRW) algorithm, as specified in
644           [RFC8584]

646        *  Type 2: Preference-based DF Election algorithm, as specified in
647           [RFC9785]

649        *  Type 4: HRW per-multicast-flow DF Election algorithm, as specified
650           in [EVPN-PER-MCAST-FLOW-DF]

<major> Perhaps explicitly mention that future documents introducing new DF
types are expected to specify their working with the BW Capability, as
applicable?

688     6.3.  BW Capability and HRW DF Election algorithm (Type 1 and 4)

690        [RFC8584] introduces Highest Random Weight (HRW) algorithm (DF Type
691        1) for DF election in order to solve potential DF election skew
692        depending on Ethernet tag space distribution.  [EVPN-PER-MCAST-FLOW-
693        DF] further extends HRW algorithm for per-multicast flow based hash
694        computations (DF Type 4).  This section describes extensions to HRW
695        Algorithm for EVPN DF Election specified in [RFC8584] and in [EVPN-
696        PER-MCAST-FLOW-DF] in order to achieve DF election distribution that
697        is weighted by link bandwidth.

<major> This paragraph gives the correct impression that what this document is
doing is extensions and not "updates" to all those other RFCs. Please reconsider
doing that "updates" tag and that too just for RFC8584. Based on the current
logic, draft-ietf-bess-evpn-per-mcast-flow-df-election would also get added to
the list of "updates" RFCs?

729        Note that the bandwidth increment must always be an integer,

<major> Is that a must or a MUST?

799     6.4.  BW Capability and Preference DF Election algorithm

801        This section applies to ES'es where all the PEs in the ES agree use
802        the BW Capability with DF Type 2.  The BW Capability modifies the
803        Preference DF Election procedure [RFC9785], by adding the LBW value
804        as a tie-breaker as follows:

<major> So, does this document also "update" RFC9785?

873        and per-[ES, EVI] RT-1 from egress PEs.  In such a case, only the
874        weights received via per-ES RT-1 from the egress PEs included in the
875        MAC path-list are to be considered for weighted path-list
876        computation.

<major> Would ' only ... path-list MUST be considered ..." be more suitable
given the implications on interoperability?

878     7.4.  EVPN IRB Multi-homing With Non-EVPN routing

880        EVPN-LAG based multi-homing on an IRB gateway may also be deployed

<major> Perhaps informative reference to RFC9135 is required here?

940        *  When a generalized weight is used, the operator MUST ensure
941           consistent interpretation of the advertised value across all
942           egress PEs associated with the Ethernet Segment.  This requirement
943           applies even when the egress PEs span multiple routing domains or
944           Autonomous Systems.

<major> The above seems odd when the document does not define any specification
for this feature across domains or ASes.

988     10.2.  EVPN Link Bandwidth Extended Community

990        This document defines a new EVPN Link Bandwidth extended community to
991        signal local ES link bandwidth to ingress PEs.  This extended
992        community is defined of type 0x06 (EVPN Extended Community Sub-
993        Types).  IANA has assigned a sub-type value of 0x10 for the EVPN Link
994        bandwidth extended community, of type 0x06 (EVPN Extended Community
995        Sub-Types).  EVPN Link Bandwidth extended community is defined as
996        transitive.

<major> Only the 3 sentence in the above paragraph is suitable for IANA
considerations as the rest is description of the extension that is already
covered in section 4.

1096    Appendix A.  BGP-Link-Bandwidth-Extended-Community

1098       Link bandwidth extended community described in [BGP-LINK-BW] for
1099       layer 3 VPNs was considered for re-use here.  This Link bandwidth
1100       extended community is however defined in [BGP-LINK-BW] as optional
1101       non-transitive.  Since it is not possible to change deployed behavior
1102       of extended community defined in [BGP-LINK-BW], it was decided to
1103       define a new one.  In inter-AS scenarios within an EVPN network, EVPN
1104       link-bandwidth needs to be signaled to eBGP neighbors.  When signaled
1105       across AS boundary, this extended community can be used to achieve
1106       optimal load-balancing towards egress PEs in a different AS.  This is
1107       applicable both when next-hop is changed or unchanged across AS
1108       boundaries.

<major> If you look at the latest version of draft-ietf-idr-link-bandwidth that
is now in the RFC Editor Q, then the above appendix is not correct as there are
now both transitive and non-transitive types. Please consider deleting this
appendix or re-writing it for accuracy so as to explain how we got to having two
things for the same thing. I would suggest knocking this off to keep things
simple.

<EoRv33>



_______________________________________________
BESS mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to