Hi Benjamin, Thank you very much for the thorough review! Version 11 addresses those comments that were not covered in version 10. Please see my comments in-line with [jorge].
Thx Jorge From: Benjamin Kaduk via Datatracker <nore...@ietf.org> Date: Thursday, October 28, 2021 at 4:53 AM To: The IESG <i...@ietf.org> Cc: draft-ietf-bess-evpn-optimized...@ietf.org <draft-ietf-bess-evpn-optimized...@ietf.org>, bess-cha...@ietf.org <bess-cha...@ietf.org>, bess@ietf.org <bess@ietf.org>, Bocci, Matthew (Nokia - GB) <matthew.bo...@nokia.com>, Bocci, Matthew (Nokia - GB) <matthew.bo...@nokia.com> Subject: Benjamin Kaduk's No Objection on draft-ietf-bess-evpn-optimized-ir-09: (with COMMENT) Benjamin Kaduk has entered the following ballot position for draft-ietf-bess-evpn-optimized-ir-09: No Objection When responding, please keep the subject line intact and reply to all email addresses included in the To and CC lines. (Feel free to cut this introductory paragraph, however.) Please refer to https://www.ietf.org/blog/handling-iesg-ballot-positions/ for more information about how to handle DISCUSS and COMMENT positions. The document, along with other ballot positions, can be found here: https://datatracker.ietf.org/doc/draft-ietf-bess-evpn-optimized-ir/ ---------------------------------------------------------------------- COMMENT: ---------------------------------------------------------------------- Thanks to Derek Atkins for the secdir review. Thanks as well to John Scudder for his detailed review; I support his discuss position and have omitted a few of my comments that he has already covered. (There are probably a few more that I could have omitted, but I did not do an exhaustive check. Feel free to just point to his ballot thread instead of repeating the explanation to me.) [jorge] Derek and John’s comments are highly appreciated, indeed. I *think* we addressed all John’s comments in version 09. I think it would be very helpful to clearly state early on what the difference between the "selective" and "non-selective" setups is. The first description I see is not until §6.2 (I comment below where it appears as well). [jorge] added in the introduction section. Section 2 - AR-IP: IP address owned by the AR-REPLICATOR and used to differentiate the ingress traffic that must follow the AR procedures. >From context I infer that the AR-IP is advertised along with the Replicator-AR RT-3 route. Since we talk about other defined values as being advertised along with such RT-3 routes, should we also say that this IP is advertised along with the corresponding RT-3 route? [jorge] added this: - AR-IP: IP address owned by the AR-REPLICATOR and used to differentiate the incoming traffic that must follow the AR procedures. The AR-IP is also used in the Tunnel Identifier and Next-Hop fields of the Replicator-AR route. - AR-VNI: VNI advertised by the AR-REPLICATOR along with the Replicator-AR route. It is used to identify the ingress packets that must follow AR procedures ONLY in the Single-IP AR-REPLICATOR case. This phrasing seems ambiguous: please distinguish whether this is used only in the single-IP AR-REPLICATOR case or it identifies packets that sometimes follow AR procedures (in the single-IP AR-REPLICATOR case) and sometimes do not. [jorge] new text: - AR-VNI: VNI advertised by the AR-REPLICATOR along with the Replicator-AR route. It is used to identify the incoming packets that must follow AR procedures ONLY in the Single-IP AR-REPLICATOR case Section 8. - PTA: PMSI Tunnel Attribute PMSI is not marked as "well-known" at https://www.rfc-editor.org/materials/abbrev.expansion.txt and should be expanded on first use or otherwise defined. [jorge] now defined in section 2 - EVI: EVPN Instance. An EVPN instance spanning the Provider Edge (PE) devices participating in that EVPN This seems rather circular. Can we define "EVPN Instance" without reference to "EVPN instance"? [jorge] new text: - EVI: EVPN Instance. A group of Provider Edge (PE) devices participating in the same EVPN service, as specified in [RFC7432]. Section 3 c. The solution is compatible with [RFC7432] and [RFC8365] and has no impact on the EVPN procedures for BM traffic. In particular, I do not think that "no impact on the EVPN procedures" is what was intended -- it obviously has impact on the procedures, since it is implemented differently. Perhaps it has no impact on the CE, but that's not what this text seems to say. [jorge] good one, I changed “EVPN” to “CE” Section 4 I agree with the directorate reviewer that splitting the RT-3 NLRI layout and the PTA general format into separate figures is quite worthwhile. I would also suggest naming the first one as the NLRI of the RT-3 route type, rather than leaving that implicit. [jorge] done, thx The Inclusive Multicast Ethernet Tag route (RT-3) and its PMSI Tunnel Attribute's (PTA) general format used in [RFC7432] are shown below: I suggest referencing RFC 6514 as the source of the PTA format. [jorge] done. The Flags field is 8 bits long. This document defines the use of 4 bits of this Flags field: That's half of the flag bits! Why is it better to allocate so many flags than to move more structure into the tunnel identifier portion of the PTA? I guess RFC 7902 does provision for extended tunnel attribute flags, but the question of whether these all belong as flags still seems valid. [jorge] that’s fair, however, those flags haven’t been used prior to this spec and indeed, RFC7902 extends the flags for future applications. Before implementing this, we analyzed ways of conveying the required information, and the conclusion was that this was the most efficient way. - Regular-IR route: in this route, Originating Router's IP Address, Tunnel Type (0x06), MPLS Label and Tunnel Identifier MUST be used as described in [RFC7432] when Ingress Replication is in use. The NVE/PE that advertises the route will set the Next-Hop to an IP address that we denominate IR-IP in this document. When advertised by an AR-LEAF node, the Regular-IR route SHOULD be advertised with type T= AR-LEAF. When would I violate this SHOULD (and what other behaviors would be usable)? [jorge] it was changed to a MUST in version 10, after John’s review. o Originating Router's IP Address MUST be set to an IP address of the PE that should be common to all the EVIs on the PE (usually this is the PE's loopback address). The Tunnel Identifier and Is it really the usual case that a PE has only one loopback address (so that the definite article "the" applies)? This seems particularly poigniant since we assume that AR-REPLICATORs will have multiple addresses available for use, to distinguish inbound IR and AR traffic. [jorge] I think it is not unusual to have multiple loopbacks. Next-Hop SHOULD be set to the same IP address as the Originating Router's IP address when the NVE/PE originates the route. [...] Should we say anything about what they are set to when the NVE/PE does not originate the route? [jorge] addressed in version 10, after John’s review. It is only used for selective AR and its fields are set as follows: The antecedent for "its fields" seems to be "the Leaf A-D route (RT-11)"; I suggest using the precise terminology that the fields of the "route type specific portion of the route" are what are described. Precise use of terminology makes the documents much more approachable to unfamiliar readers that rely on textual search to correlate the relevant parts of the various documents in question. [jorge] that’s fair. I changed it to: “The Leaf Auto-Discovery route is only used for selective AR and the fields of such route are set as follows:” - Replicator-AR route: this route is used by the AR-REPLICATOR to advertise its AR capabilities, with the fields set as follows: o Originating Router's IP Address MUST be set to an IP address of the PE that should be common to all the EVIs on the PE (usually this is the PE's loopback address). The Tunnel Identifier and I note that the guidance in RFC 7432 for constructing what we in this document refer to as the "Regular-IR route" also has text about "the PE's loopback address" being useful for what we would call the IR-IP, but this address here is the AR-IP and (if we keep reading) SHOULD be different than the IR-IP. I think we need to say something about whether PEs are really expected to only have one ("the") loopback address vs multiple, and if there is only one how to decide whether to use it as AR-IP or IR-IP. To use language ("the PE's loopback address") that implies there is only one, while strongly suggesting that it be used for two different purposes and also strongly suggesting that those two different purposes have different addresses, seems to be internally inconsistent. [jorge] yes, good point. This was addressed in version 10. o The AR-LEAF constructs an IP-address-specific route-target as indicated in [I-D.ietf-bess-evpn-bum-procedure-updates], by placing the IP address carried in the Next-Hop field of the received Replicator-AR route in the Global Administrator field of the Community, with the Local Administrator field of this Community set to 0. [...] The analogous text in draft-ietf-bess-evpn-bum-procedure-updates also mentions "setting the Extended Communities attribute of the Leaf A-D route to that Community"; would that be useful to include here as well? [jorge] it’s probably good, I added it, thanks. o The Leaf A-D route MUST include the PMSI Tunnel attribute with the Tunnel Type set to AR, type set to AR-LEAF and the Tunnel Identifier set to the IP of the advertising AR-LEAF. The PMSI Tunnel attribute MUST carry a downstream-assigned MPLS label or VNI that is used by the AR-REPLICATOR to send traffic to the AR-LEAF. This seems to be the only place where we specify the actual format/contents (i.e., including Tunnel Identifier contents) of the "AR" PTA tunnel type. I would have expected something more declarative of a declaration, that the IANA registration could point to. [jorge] added a reference to the IANA section. Section 5.1 It's a bit unfortunate that there's so much overlap between the list of "considerations" and the "rules" that an implementation must be compatible with, but it may be too risky to try to coalesce them at this time. b. An AR-REPLICATOR MUST advertise a Replicator-AR route and MAY advertise a Regular-IR route. The AR-REPLICATOR MUST NOT generate a Regular-IR route if it does not have local attachment circuits (AC). If the Regular-IR route is advertised, the AR Type field is set to zero. This seems to merit some more substantial discussion, since the value of zero in the AR type field is otherwise avoided in this document. That is, we have specific values for "leaf" and "acting as replicator", but the value zero is normally "does not support optimized-ir". Except here it's also used for "replicator advertising as non-replicator role"; it's probably appropriate to not abuse "leaf" for this case, but using zero seems to in some sense be a different abuse. Would the '11' value have been usable to indicate this distinction? [jorge] based on Alvaro’s review, we added a sentence saying that value ‘11’ on reception would be interpreted as ‘00’, i.e. the route was advertised by an RNVE. So it could have been used, however the intent is that the Regular-IR follows RFC7432 as much as possible, hence we didn’t see the need to use anything else. This is the way it has been implemented and we don’t see any issued with it.. let us know if you see any issues. Section 6 The solution is called "selective" because a given AR-REPLICATOR MUST replicate the BM traffic to only the AR-LEAF that requested the replication (as opposed to all the AR-LEAF nodes) and MAY replicate the BM traffic to the RNVEs. [...] I'm not sure I understand the motivation behind MAY, here. If we don't replicate the BM traffic to RNVEs isn't that data loss? [jorge] you’re right - changed it to MUST. I think the reason it was a MAY is because usually you’d expect no RNVEs in the selective mode. New text: “The solution is called "selective" because a given AR-REPLICATOR MUST replicate the BM traffic to only the AR-LEAFs that requested the replication (as opposed to all the AR-LEAF nodes) and MUST replicate the BM traffic to the RNVEs (if there are any).” Section 6.1 o When a node defined and operating as Selective AR-REPLICATOR receives a packet on an overlay tunnel, it will do a tunnel destination IP lookup and if the destination IP is the AR- REPLICATOR AR-IP Address, the node MUST replicate the packet to: [...] + overlay tunnels to the remote Selective AR-REPLICATORs if the tunnel source IP is an IR-IP of its own AR-LEAF-set (in any other case, the AR-REPLICATOR MUST NOT replicate the BM traffic to remote AR-REPLICATORs), where the tunnel destination IP is the AR-IP of the remote Selective AR- REPLICATOR. The tunnel destination IP AR-IP will be a It seems like it would require less cognitive burden on the reader if we disambiguated "tunnel source IP" as it relates to the incoming tunnel on which the packet in question was received vs the outgoing tunnel to which it is being replicated. ("tunnel destination IP" is arguably already disambiguated by the lead-in text that talks about doing a lookup based on the tunnel the packet was received on.) Given that the "rules" that appear later to specifically say that it checks both destination and source of the underlay IP header, it seems reasonable to say something similar here when listing the "considerations". [jorge] fair point, changed the text to: + overlay tunnels to the remote Selective AR-REPLICATORs if the tunnel source IP address (of the encapsulated packet that arrived on the overlay tunnel) is an IR-IP of its own AR-LEAF-set. In any other case, the AR-REPLICATOR MUST NOT replicate the BM traffic to remote AR-REPLICATORs. When doing this replication, the tunnel destination IP address is the AR-IP of the remote Selective AR-REPLICATOR. The tunnel destination IP AR-IP will be an indication for the remote Selective AR-REPLICATOR that the packet needs further replication to its AR-LEAFs. + The Selective AR-REPLICATOR-set is composed of the overlay tunnels to all the AR-REPLICATORs that send a Replicator-AR route with L=1. The AR-IP addresses are used as tunnel destination IP. I'm not sure why the "that send a Replicator-AR route with L=1" clause is needed -- if there are AR-REPLICATORS that send with L=0 then aren't we required to fall back to the non-selective procedures? [jorge] you are correct, however I still think it is good to repeat – a reader may think the non-selective AR-REPLICATORs may be part of the set. - In any case, non-BM overlay tunnels are excluded from flood-lists and, also, source squelching is always done in order to ensure the traffic is not sent back to the originating source. If the encapsulation is MPLSoGRE (or MPLSoUDP) and the BD label is not the bottom of the stack, the AR-REPLICATOR MUST copy the rest of the labels when forwarding them to the egress overlay tunnels. I'm not sure that I understand which labels "the rest of the labels" are in this context. [jorge] it refers to any other labels that are allowed by RFC7432 and 8365, e.g. ESI label, or future labels. I modified the text to: If the encapsulation is MPLSoGRE or MPLSoUDP and the received BD label (or label that the AR-REPLICATOR advertised in the Replicator-AR route) is not the bottom of the stack, the AR- REPLICATOR MUST copy the all the labels below the BD label and propagate them when forwarding the packet to the egress overlay tunnels. Section 6.2 In the example of Figure 1, we consider NVE1/NVE2/NVE3 as Selective AR-LEAFs. NVE1 selects PE1 as its Selective AR-REPLICATOR. If that is so, NVE1 will send all its BM traffic for BD-1 to PE1. If other AR-LEAF/REPLICATORs send BM traffic, NVE1 will receive that traffic from PE1. These are the differences in the behavior of a Selective AR-LEAF compared to a non-selective AR-LEAF: I think this might be the first time we concretely say what makes the "selective" procedures earn that name (the combined selectivity of all BM traffic, in both directions, between leaf and replicator, as opposed to the non-selective case where leafs pick only the replicator that they send to, and must receive from everywhere. This seems like something that would be useful to have much earlier in the document, e.g., in the introduction. (It's also somewhat different than the sense in which RFC 6513 constrasts selective and inclusive tunnels, though I expect it's probably too late to try to change the terminology used here.) [jorge] I hope the new introduction section addresses this comment. Please let me know otherwise. Section 8 - An AR-REPLICATOR will perform IR or AR forwarding mode for the incoming Overlay packets based on an ingress VNI lookup, as opposed to the tunnel IP DA lookup. Note that, when replicating to remote AR-REPLICATOR nodes, the use of the IR-VNI or AR-VNI advertised by the egress node will determine the IR or AR forwarding mode at the subsequent AR-REPLICATOR. Does this implicitly put a requirement on all AR-REPLICATOR implementations to support the VNI-based scheme, since they might be called upon to forward to another replicator using it? [jorge] I don’t think it does.. In theory you could mix single-IP and multiple-ip AR-REPLICATORs. The text does not preclude this. Section 9.1 In order to be compatible with the IP SA split-horizon check, the AR- REPLICATOR MAY keep the original received tunnel IP SA when replicating packets to a remote AR-LEAF or RNVE. This will allow AR- LEAF nodes to apply Split-horizon check procedures for BM packets, before sending them to the local Ethernet-Segment. Even if the AR- LEAF's IP SA is preserved when replicating to AR-LEAFs or RNVEs, the AR-REPLICATOR MUST always use its IR-IP as IP SA when replicating to other AR-REPLICATORs. It seems unfortunate that an AR-LEAF node needs to have knowledge of the configuration in use at remote AR-REFLECTORs in order to know if the split-horizon check will be effective. Is there no way to always require certain replicator behavior and give the leafs reliable knowledge in split-horizon scenarios? [jorge] there are other extensions in newer documents e.g. draft-ietf-bess-extended-evpn-optimized-ir, that deal with multi-homing in a more deterministic way. We thought this document just needed to allow the preservation of the IP SA as an option. Hope it is okay to leave it as is. Section 9.2 Ethernet Segments associated to one or more AR-REPLICATOR nodes SHOULD follow "Local-Bias" procedures for EVPN all-active multi- homing, as follows: Is it really the "ethernet segments" that would follow local-bias procedures, or the EVPN nodes attached to them? [jorge] changed to “AR-REPLICATOR nodes attached to the same all-active Ethernet Segment SHOULD follow "Local-Bias" procedures, as follows:” Is putting SHOULD-level guidance to this effect in effect updating the core EVPN specification to privilege one way of handling multi-homing over others? (Maybe not, since the requirements only come into play when AR-REPLICATORs are involved and we disclaim applicability to cases where AR-REPLICATOR and AR-LEAF are on the same ethernet segment ... we might consider saying that as some part of why that case is out of scope.) [jorge] We analyzed the mixed cases and saw that some required extended procedures not described here. The reason is that when you replicate from an AR-REPLICATOR to another AR-REPLICATOR, you use the IR-IP as the IP source address, hence if the destination AR-REPLICATOR needs to apply local bias filtering for an ethernet segment that is shared with a leaf, that is not possible since we lost the source IP that identified the ingress leaf. Added this: The mixed case, that is, an AR-LEAF node and an AR- REPLICATOR node are attached to the same Ethernet Segment, would require extended procedures and it is out of scope. Also, if we know of procedures other than local-bias that will still be effective, we might mention them as some justification for why this is only a SHOULD and not a MUST. [jorge] I think you are right and the normative language needs to change to MUST. I changed it in version 11. Section 10 Since we use the Leaf A-D route from [bum-procedure-update], we might want to pull in its security considerations as well. [jorge] added, thx I feel like there may be some more considerations to mention that are specific to the multi-homing case, but I don't think I understand that scenario well enough to be able to state them, myself. [jorge] good point. I added: This document allows the AR-REPLICATOR to preserve the tunnel IP Source Address of the AR-LEAF (as an option) when forwarding BM packets from an overlay tunnel to another overlay tunnel. Preserving the AR-LEAF IP Source Address makes the "Local Bias" filtering procedures possible for AR-LEAF nodes that are attached to the same Ethernet Segment. If the AR-REPLICATOR does not preserve the AR-LEAF IP Source Address, AR-LEAF nodes attached to all-active Ethernet Segments will cause packet duplication on the multi-homed CE. We might also mention that AR-REPLICATORs are, by design, using more bandwidth than stock RFC7432 PEs would, and if they exceed their local bandwidth that will cause service disruption. [jorge] Sure, added: The AR-REPLICATOR nodes are, by design, using more bandwidth than [RFC7432] PEs or [RFC8365] NVEs would use. Certain network events or unexpected low performance may exceed the AR-REPLICATOR local bandwidth and cause service disruption. The text that's here already does do a pretty good job of capturing the important topics for the common case, though -- thanks! An implementation following the procedures in this document should not create BM loops, since the AR-REPLICATOR will always forward the BM traffic using the correct tunnel IP Destination Address that indicates the remote nodes how to forward the traffic. This is true in both, the Non-Selective and Selective modes defined in this document. (In the vein of my earlier comment,) what about the case when the tunnel destination is expecting to use VNI to determine how to forward the traffic? [jorge] good point. New text: This document introduces the ability for the AR-REPLICATOR to forward traffic received on an overlay tunnel to another overlay tunnel. The reader may interpret that this introduces the risk of BM loops. That is, an AR-LEAF receiving a BM encapsulated packet that the AR-LEAF originated in the first place, due to one or two AR-REPLICATORs "looping" the BM traffic back to the AR-LEAF. The procedures in this document prevent these BM loops, since the AR-REPLICATOR will always forward the BM traffic using the correct tunnel IP Destination Address (or correct VNI in case of single-IP AR-REPLICATORs) that instructs the remote nodes how to forward the traffic. This is true in both the Non-Selective and Selective modes defined in this document. However, a wrong implementation of the procedures in this document may lead to those unexpected BM loops. Section 14.2 Having SHOULD-level guidance to use the "local bias" procedures detailed in RFC 8365 might require that document to be promoted to a normative reference; see https://www.ietf.org/about/groups/iesg/statements/normative-informative-references/ [jorge] done. NITS Section 1 Section 3 lists the requirements of the combined optimized-IR solution, whereas Section 5 and Section 6 describe the Assisted- Replication (AR) solution, and Section 7 the Pruned-Flood-Lists (PFL) solution. I suggest mentioning that sections 5 and 6 differ in that they cover the selective and non-selective cases. [jorge] done, thanks. Section 2 - Regular-IR: Refers to Regular Ingress Replication, where the source NVE/PE sends a copy to each remote NVE/PE part of the BD. s/part of/that is part of/ [jorge] fixed Section 4 - Regular-IR route: in this route, Originating Router's IP Address, Tunnel Type (0x06), MPLS Label and Tunnel Identifier MUST be used as described in [RFC7432] when Ingress Replication is in use. The NVE/PE that advertises the route will set the Next-Hop to an IP address that we denominate IR-IP in this document. When advertised by an AR-LEAF node, the Regular-IR route SHOULD be advertised with type T= AR-LEAF. Hmm, down near the end of page 9 we say that AR-enabled nodes MUST signal the proper AR type (1 or 2) according to its administrative choice -- how is that MUST compatible with the SHOULD here? [jorge] yes, it was changed in version 10 Also, if we're going to write out T = 01 (AR-REPLICATOR) just a few lines later, we should write out T = 10 (AR-LEAF) here. [jorge] done, thx o The AR-LEAF constructs an IP-address-specific route-target as indicated in [I-D.ietf-bess-evpn-bum-procedure-updates], by placing the IP address carried in the Next-Hop field of the Pedantically, "as indicated in [bum-procedure-update]" would involve "placing the IP address carried in the Next Hop of the received I/S-PMSI A-D route in the Global Administrator field of the Community", which is obviously not going to be applicable in this case. So "analogously to" might be more appropriate than "as indicated". [jorge] done, thx received Replicator-AR route in the Global Administrator field of the Community, with the Local Administrator field of this Community set to 0. Note that the same IP-address-specific import route-target is auto-configured by the AR-REPLICATOR that sent the Replicator-AR, in order to control the acceptance of the Leaf A-D routes. This "Note that ... is auto-configured by" phrasing suggests to me that there is some more detailed text elsewhere laying out a requirement to do this (and any needed procedures, though I suspect there are no real procedures to document). However, later on §6.1 refer back to §4 (here) for "the AR-REPLICATOR auto-configures its IP-address-specific import route-target as described in section Section 4." Maybe we could write this in a way that's more clearly a specification and binding on the AR-REPLICATOR? [jorge] done, thx o The Leaf A-D route MUST include the PMSI Tunnel attribute with the Tunnel Type set to AR, type set to AR-LEAF and the Tunnel "type" here seems to refer to the new T field in the PTA flags, and should probably be referenced using consistent terminlogy. [jorge] done, thx Each AR-enabled node MUST understand and process the AR type field in the PTA (Flags field) of the routes, and MUST signal the (same point about consistent terminology for T/AR-type) [jorge] done, thx corresponding type (1 or 2) according to its administrative choice. I suggest writing "01" and "10" to match the previous treatment of the two-bit field. [jorge] changed as per a previous review. Section 5.1 - When an AR-REPLICATOR receives a BM packet on an AC, it will forward the BM packet to its flooding list (including local ACs and remote NVE/PEs), skipping the non-BM overlay tunnels. I assume that it goes without saying that the AR-REPLICATOR does not flood the packet back to the AC it came in on. (The "rules" later in the section do specify source squelching.) [jorge] yes, but that’s implicit? Section 5.2 b. In this non-selective AR solution, the AR-LEAF MUST advertise a single Regular-IR inclusive multicast route as in [RFC7432]. The AR-LEAF SHOULD set the AR Type field to AR-LEAF. Note that although this flag does not make any difference for the egress nodes when creating an EVPN destination to the AR-LEAF, it is egress, or ingress? [jorge] changed to ‘remote’ to clarify
_______________________________________________ BESS mailing list BESS@ietf.org https://www.ietf.org/mailman/listinfo/bess