Hi Jeff, I think I can only touch on a few points before telechat-time rolls around, and will finish off afterwards.
On Wed, Dec 18, 2019 at 03:24:48PM -0500, Jeffrey Haas wrote: > Benjamin, > > On Mon, Dec 16, 2019 at 03:43:13PM -0800, Benjamin Kaduk via Datatracker > wrote: > > Benjamin Kaduk has entered the following ballot position for > > ---------------------------------------------------------------------- > > DISCUSS: > > ---------------------------------------------------------------------- > > > > I have a few points that I think merit IESG discussion. > > > > (1) I see that several directorate reviewers expressed unease at the > > destination (IP and) MAC address assignment procedure for the inner > > VXLAN headers, and appreciate that there was extensive on-list > > discussion (more than I could follow). That said, I failed to find a > > clear statement of why the current text is believed to be safe, and in > > fact my reading of the current text is that the described procedure is > > *not* safe. Pointers to key parts of the WG discusison would be more > > than welcome! > > One high level point that likely didn't survive the rather verbose comment > chain is there are two implementations of this draft. Some of the > considerations covered in the guidance here is "please don't break shipping > code". > > While this is IETF, and shipping code isn't always a blocking point to > document changes, I'd suggest that as a consideration. It did indeed not survive (at least my pass through) the comment chain, so thank you for calling it out. It is indeed a consideration, and I expect some actual discussion on the call tomorrow. > > To take something of a high-level view of my concerns, if we think of > > the VXLAN as being a tunnel between VTEPs that carry encapsulated tenant > > traffic, then what we're trying to do is roughly like BFD between VTEPs, > > but we want to get fault-detection over as broad a coverage as we can > > (the "outermost part of the tunnel"), so we want to have the option of > > per-VNI BFD instead of just endpoint-to-endpoint (VTEP-to-VTEP). > > You've summarized this clearly. Joel Halpern, in particular, raised this > point multiple times. Effectively, "what are we testing?" And the response > not clearly converging on exactly one of the two possibilities. > > As is noted in the various IESG discussion, each of the two test points > raise slightly different considerations. > > > However, we end up having to do this by trying to insert a thin filter > > into the tenant's address space (i.e., the inner VXLAN header) and pick > > out the specific stream of BFD traffic that we're introducing. This is, > > in some sense, a namespace grab in what is conceptually the tenant's > > namespace, and we have to be careful that what we do is either > > guaranteed to not impact the tenant or well-documented and > > compartmentalized (akin to the "well-known URIs"). > > Possibly, and it's certainly a consideration. However, I think I'm less > convinced of it being quite the level of violation that seems to be > reflected in the rest of the IESG comments in the various other threads. > I'll respond to that detail a bit below. > > > I've made comments at several places in the document that are more > > directly tied to specific pieces of text, but in general, if we assume > > that the tenant can add/remove new addresses at will within their VXLAN > > abstration, then any attempt to preconfigure by mutual agreement the BFD > > addresses to use at the VTEPs or to use the VTEP's normal (outer) > > address as the sentinel value seems subject to the tenant coming in and > > subsequently trying to use that address, leading to (some of) the > > tenant's traffic getting silently filtered and interpreted by the VTEP. > > If we were using domain names as identifiers, we could allocate > > something under .arpa or similar, but I think our options are more > > limited when numerical addresses are used. > > > > The option suggested by the rtg-dir reviewer of always using the > > management VNI does not suffer from this namespacing issue, though I > > recognize that it does reduce the scope over which fault-detection is > > available, for the cases when different VNIs' traffic are routed or > > handled differently. > > This is a clean summary of the considerations. At least a portion of the WG > seems to be comfortable with "test to the management VNI". However, another > (smaller, I believe) portion were wanting to test one layer further in. It is reassuring that I at least managed to summarize the situation tolerably. Is it fair to say that testing "one layer further in" is a superset of what "test to the managemenet VNI" can do? > > (2) Section 6 says: > > > > The selection > > of the VNI number of the Management VNI MUST be controlled through > > management plane. An implementation MAY use VNI number 1 as the > > default value for the Management VNI. All VXLAN packets received on > > the Management VNI MUST be processed locally and MUST NOT be > > forwarded to a tenant. > > > > It seems like the management VNI concept is something that would apply > > to the entire VXLAN deployment and not just to the BFD-using portions; > > is this already defined somewhere (in which case we should reference > > it), or is it new with this document? In the latter case wouldn't it be > > an update to the core VXLAN spec? (I note that there are some > > procedural hoops to jump through for an IETF-stream document to update > > an ISE-stream document...) > > The relevant portion of the archive will have the Subject: line text > including: > "Trapping BFD Control packet at VTEP" > > A portion of the discussion relating to the magic number of the management > VNI suggested '1', instead of '0'. > > At least some implementations already use '0': > https://mailarchive.ietf.org/arch/msg/rtg-bfd/6WfSATmfoPv4AD6RmD-Xb7zz4CE > > The argument to not use '0' starts roughly here: > https://mailarchive.ietf.org/arch/msg/rtg-bfd/z8E_a5k_r4pLLs5YfNsL_Xm9_Us > > You're correct, IMO, that there's no standard practice and the above seems > to support this. I believe this leaves the document authors in the position > of being requested to make a recommendation for the default value of this > field and knowing that the default would be invalid on some platforms. > > The alternative is requiring implementations to always configure this value. > > I suggest the IESG determine whether it wants a default value here or not. > If not, the text should be adjusted to require configuration. If yes, the > IESG should consider whether the nvo3 group should produce some document > that covers current operational practices. That does sound like something we should try to talk about on the telechat as well; thanks for raising it so clearly. > > ---------------------------------------------------------------------- > > COMMENT: > > ---------------------------------------------------------------------- > > 0:0:0:0:0:FFFF:7F00:0/104 range for IPv6). There could be a firewall > > configured on VTEP to block loopback addresses if set as the > > destination IP in the inner IP header. It is RECOMMENDED to allow > > addresses from the loopback range through a firewall only if it is > > used as the destination IP address in the inner IP header, and the > > destination UDP port is set to 3784 [RFC5881]. > > > > I think we should reword this to make it clear that the default behavior > > is still "block all incoming traffic with loopback destination" and that > > the exception is tightly scoped to the encapsulated VXLAN traffic > > discussed in this document and the specific destination port *and when > > BFD has been configured for the VTEP*. I note that well-known ports are > > not reserved ports, and we have no guarangee that only a BFD > > implementation would be listening on port 3784. > > I don't think this consideration is necessarily critical. I think I'm in agreement about its criticality, and will see if I can come up with some actual text ... later. -Ben > BFD implementations residing in the related instance communicating to other > instances across the vxlan environment would be using RFC 5881 or RFC 5883 > style BFD. Since this isn't a tunneled BFD, the IP endpoints of the BFD > control traffic will be unicast addresses rather than the reserved > "loopback" ranges; i.e. 127/8 ::FFFF:127.0.0.0/104. In order for those > ranges to be problematic, it'd be necessary for the client to be able to > manually encapsulate a vxlan packet - a security issue of its own. > > A related point in this discussion is "we're hijacking an address managed by > the local tenant". While true, it's in the above ranges and thus somewhat > under the auspice of the host OS to assert control. I'm aware of some > unusual applications that make use of configured addresses in those ranges > for on-box communications, but they're also on the unusual end of things. > > What sort of text would you want to cover the case that when BFD is run > up-to-the-tenant mode in this circumstance that an address MUST be reserved > for the BFD over vxlan application and that this address SHALL NOT be > available to the tenant for its own use? > > > VXLAN packet. The choice of Destination MAC and Destination IP > > addresses for the inner Ethernet frame MUST ensure that the BFD > > Control packet is not forwarded to a tenant but is processed locally > > at the remote VTEP. [...] > > > > This has to be 100% reliable, and I think we need to provide some > > example mechanism that has that property even if we don't mandate that > > it be the only allowed mechanism. > > The consideration here, I believe, is that there's currently too much > latitude by implementations as to what MAC addresses they use here. > Restrict one case, you may break some implementation. > > The missing element is how a pair of implementations of BFD for vxlan > discover the necessary information? As far as BFD is concerned, "tell me!" > This seems like work that belongs in nvo3. > > > Destination MAC: This MUST NOT be of one of tenant's MAC > > addresses. The destination MAC address MAY be the address > > > > But the tenant can start using new MAC addresses at any time! How is > > BFD-over-VXLAN going to dynamically detect and avoid that? > > See above. Either it's coordinated with the ability to prevent the tenant > from using it or the underlying vxlan environment needs to provide some > mechanism to discover what's been provisioned. > > > associated with the destination VTEP. The MAC address MAY be > > configured, or it MAY be learned via a control plane protocol. > > The details of how the MAC address is obtained are outside the > > scope of this document. > > > > This all talks about the MAC address being relatively static > > configuration, but per above, I don't think that's safe in the face of a > > MUST-level requirement to avoid conflicting with tenant MAC addresses. > > But is it BFD's responsibility to figure this out? This is what the > document is suggesting - a higher level with access to the implementation > specifics should be supplying the BFD provisioning information. Or manual > provisioning in the absence thereof. > > -- Jeff