Benjamin Kaduk has entered the following ballot position for draft-ietf-bfd-vxlan-09: Discuss
When responding, please keep the subject line intact and reply to all email addresses included in the To and CC lines. (Feel free to cut this introductory paragraph, however.) Please refer to https://www.ietf.org/iesg/statement/discuss-criteria.html for more information about IESG DISCUSS and COMMENT positions. The document, along with other ballot positions, can be found here: https://datatracker.ietf.org/doc/draft-ietf-bfd-vxlan/ ---------------------------------------------------------------------- DISCUSS: ---------------------------------------------------------------------- I have a few points that I think merit IESG discussion. (1) I see that several directorate reviewers expressed unease at the destination (IP and) MAC address assignment procedure for the inner VXLAN headers, and appreciate that there was extensive on-list discussion (more than I could follow). That said, I failed to find a clear statement of why the current text is believed to be safe, and in fact my reading of the current text is that the described procedure is *not* safe. Pointers to key parts of the WG discusison would be more than welcome! To take something of a high-level view of my concerns, if we think of the VXLAN as being a tunnel between VTEPs that carry encapsulated tenant traffic, then what we're trying to do is roughly like BFD between VTEPs, but we want to get fault-detection over as broad a coverage as we can (the "outermost part of the tunnel"), so we want to have the option of per-VNI BFD instead of just endpoint-to-endpoint (VTEP-to-VTEP). However, we end up having to do this by trying to insert a thin filter into the tenant's address space (i.e., the inner VXLAN header) and pick out the specific stream of BFD traffic that we're introducing. This is, in some sense, a namespace grab in what is conceptually the tenant's namespace, and we have to be careful that what we do is either guaranteed to not impact the tenant or well-documented and compartmentalized (akin to the "well-known URIs"). I've made comments at several places in the document that are more directly tied to specific pieces of text, but in general, if we assume that the tenant can add/remove new addresses at will within their VXLAN abstration, then any attempt to preconfigure by mutual agreement the BFD addresses to use at the VTEPs or to use the VTEP's normal (outer) address as the sentinel value seems subject to the tenant coming in and subsequently trying to use that address, leading to (some of) the tenant's traffic getting silently filtered and interpreted by the VTEP. If we were using domain names as identifiers, we could allocate something under .arpa or similar, but I think our options are more limited when numerical addresses are used. The option suggested by the rtg-dir reviewer of always using the management VNI does not suffer from this namespacing issue, though I recognize that it does reduce the scope over which fault-detection is available, for the cases when different VNIs' traffic are routed or handled differently. (2) Section 6 says: The selection of the VNI number of the Management VNI MUST be controlled through management plane. An implementation MAY use VNI number 1 as the default value for the Management VNI. All VXLAN packets received on the Management VNI MUST be processed locally and MUST NOT be forwarded to a tenant. It seems like the management VNI concept is something that would apply to the entire VXLAN deployment and not just to the BFD-using portions; is this already defined somewhere (in which case we should reference it), or is it new with this document? In the latter case wouldn't it be an update to the core VXLAN spec? (I note that there are some procedural hoops to jump through for an IETF-stream document to update an ISE-stream document...) ---------------------------------------------------------------------- COMMENT: ---------------------------------------------------------------------- Section 1 In the case where a Multicast Service Node (MSN) (as described in Section 3.3 of [RFC8293]) resides behind a Network Virtualization Endpoint (NVE), the mechanisms described in this document apply and can, therefore, be used to test the connectivity from the source NVE to the MSN. I'm not sure that I'm parsing "resides behind" properly. Is the idea that the multicast traffic starts off at a tenant-system source, hits a NVE gateway to enter the VXLAN, traverses the VXLAN a bit before getting to the MSN, and is replicated from the MSN to various NVE termini? I think I'd be less confused if this was described as "participates in the VXLAN" or "is part of the virtualized environment", as the current "behind" wording makes me think of a firewall-like topology where the NVE behind which the MSN resides will be decapsulating traffic. This document describes the use of Bidirectional Forwarding Detection (BFD) protocol to enable monitoring continuity of the path between VXLAN VTEPs, performing as Network Virtualization Endpoints, and/or availability of a replicator multicast service node. All the commas here potentially make the parsing ambiguous; assuming that the "performing as Network Virtualization Endpoints" is just describing the VXLAN VTEPs, I'd suggest do drop the first comma and instead join those clauses with "that are". Section 3 between the same pair of VTEPs. BFD packets intended for a VTEP MUST NOT be forwarded to a VM as a VM may drop BFD packets leading to a false negative. This method is applicable whether the VTEP is a [This "MUST NOT" is a very strict requirement, so we have to be sure that it's achievable without disruption to tenant traffic, per the Discuss point] At the same time, a service layer BFD session may be used between the tenants of VTEPs IP1 and IP2 to provide end-to-end fault management. In such case, for VTEPs BFD Control packets of that session are indistinguishable from data packets. nit(?): I suggest s/indistinguishable from/regular/ -- the tenants' BFD sessions are just regular data to the VXLAN infrastructure, though IIUC a VTEP could, if so inclined, peek inside and "distinguish" them from non-BFD tenant data based on on heuristics and packet format. 0:0:0:0:0:FFFF:7F00:0/104 range for IPv6). There could be a firewall configured on VTEP to block loopback addresses if set as the destination IP in the inner IP header. It is RECOMMENDED to allow addresses from the loopback range through a firewall only if it is used as the destination IP address in the inner IP header, and the destination UDP port is set to 3784 [RFC5881]. I think we should reword this to make it clear that the default behavior is still "block all incoming traffic with loopback destination" and that the exception is tightly scoped to the encapsulated VXLAN traffic discussed in this document and the specific destination port *and when BFD has been configured for the VTEP*. I note that well-known ports are not reserved ports, and we have no guarangee that only a BFD implementation would be listening on port 3784. I think the rewording would include some phrasing like "RECOMMENDED that the only firewall exception to allow incoming traffic with destination address from the loopback range is when [...]", and of course, mention the need to have BFD configured. Section 4 VXLAN packet. The choice of Destination MAC and Destination IP addresses for the inner Ethernet frame MUST ensure that the BFD Control packet is not forwarded to a tenant but is processed locally at the remote VTEP. [...] This has to be 100% reliable, and I think we need to provide some example mechanism that has that property even if we don't mandate that it be the only allowed mechanism. Destination MAC: This MUST NOT be of one of tenant's MAC addresses. The destination MAC address MAY be the address But the tenant can start using new MAC addresses at any time! How is BFD-over-VXLAN going to dynamically detect and avoid that? associated with the destination VTEP. The MAC address MAY be configured, or it MAY be learned via a control plane protocol. The details of how the MAC address is obtained are outside the scope of this document. This all talks about the MAC address being relatively static configuration, but per above, I don't think that's safe in the face of a MUST-level requirement to avoid conflicting with tenant MAC addresses. IP header: Destination IP: IP address MUST NOT be of one of tenant's IP addresses. The IP address SHOULD be selected from the range 127/8 for IPv4, for IPv6 - from the range 0:0:0:0:0:FFFF:7F00:0/104. Alternatively, the destination IP address MAY be set to VTEP's IP address. As for MAC addresses, can't the tenant start using new ones at any time? Loopback is mostly safe in that the tenant generally shouldn't expect incoming traffic to that destination address ... but what if the tenant is also using a BFD scheme that expects incoming (single-hop) packets to loopback as an exception to RFC 1122? nit: please use a parallel grammatical construction for describing the IPv4 and IPv6 recommended behavior. TTL or Hop Limit: MUST be set to 1 to ensure that the BFD packet is not routed within the Layer 3 underlay network. This addresses the scenario when the inner IP destination address is of VXLAN gateway and there is a router in underlay which removes the VXLAN header, then it is possible to route the packet as VXLAN gateway address is routable address. nit: the grammar here is a bit wonky; I think the following preserves the meaning with better grammar: % TTL or Hop Limit: MUST be set to 1 to ensure that the BFD % packet is not routed within the Layer 3 underlay network. This % addresses the scenario where the inner IP destination address is % that of a VXLAN gateway and there is a router in the underlay % that removes the VXLAN header; in such cases it is possible for % the packet to be routed, as the VXLAN gateway's address is a % routable address. Section 5 Once a packet is received, VTEP MUST validate the packet. If the Destination MAC of the inner Ethernet frame matches one of the MAC addresses associated with the VTEP the packet MUST be processed further. If the Destination MAC of the inner Ethernet frame doesn't What prevents the scenario where the MAC address associated with the VTEP is also in use by the tenant? match any of VTEP's MAC addresses, then the processing of the received VXLAN packet MUST follow the procedures described in Section 4.1 [RFC7348]. If the BFD session is using the Management VNI (Section 6), BFD Control packets with unknown MAC address MUST NOT be forwarded to VMs. nit: either "an unknown" or "MAC addresses" The UDP destination port and the TTL of the inner IP packet MUST be validated to determine if the received packet can be processed by BFD. Can you give a pointer to or description of what this validation consists of? Section 5.1 case of VXLAN, the VNI number identifies that logical link. If BFD packet is received with non-zero Your Discriminator, then BFD session MUST be demultiplexed only with Your Discriminator as the key. nits: "If a BFD packet", "then the BFD session" Section 6 In most cases, a single BFD session is sufficient for the given VTEP to monitor the reachability of a remote VTEP, regardless of the number of VNIs. When the single BFD session is used to monitor the reachability of the remote VTEP, an implementation SHOULD choose any of the VNIs. An implementation MAY support the use of the Management nit: I feel like this is trying to say that the choice is arbitrary and it doesn't matter which one is picked, but "SHOULD choose any of" is more of a recommendation to make a choice than guidance on how to make that choice, as written. Section 9 I think we need to discuss the risk/potential consequences of a VTEP failing to properly filter BFD traffic and incorrectly passing it through to the tenant. Relatedly, I'd also consider discussing the case of a mixed deployment where one peer attempts to speak BFD-VXLAN to a peer that does not implement that mechanism. The document requires setting the inner IP TTL to 1, which could be used as a DDoS attack vector. Thus the implementation MUST have An attack vector on what part of the system?