Christian, Thanks for your review. Some of my comments will overlap those from Alan.
On Fri, Jun 07, 2024 at 09:54:57PM -0700, Christian Huitema via Datatracker wrote: > The authentication sequence number is a 32 bit field. Such numbers can roll > over, either after a long duration session or due to a packet injection > attack. As Alan points out, normal rollover is something we're usually unbothered by in the existing authentication algorithms. The point you have here is far more about the underlying issue for the null authentication procedures: > There is some text about that in the description of the NULL authentication. > It > says: > > If bfd.AuthSeqKnown is 1, and the received Sequence Number field is > not equal to bfd.RcvAuthSeq + 1 (in a circular number space), then > the loss count is incremented by one and bfd.RcvAuthSeq is set to the > received Sequence Number. > > That does not look quite right. Suppose that due to out of order delivery, the > packets are received in order 1-3-2-4. Upon reception of packet 3, the > algorithm counts one loss and set the next expected value to 4. After packet > 2, > another loss and expected value to 3. After packet 4, another loss and > expected > value to 5. So, three losses when none actually occurred. Agreed. We do mention this here: : Implementations MAY provide mechanisms wherein all expected packets received : across an expected interval but delivered out of order are not considered : lost packets. We indeed discussed the option about how to avoid some of these out of order issues as part of active attacks vs. BFD sessions with NULL authentication. The conclusion from that thread is we simply CANNOT leverage the sequence numbers for purposes of "do we pass the authentication checks". As you note here: > In RFC 5880, the specification of Meticulous Keyed MD5 addresses both number > rollover and out of order delivery. The same text is repeated for meticulous > MD5 and meticulous SHA1: > > ... if the > sequence number lies outside of the range of bfd.RcvAuthSeq+1 to > bfd.RcvAuthSeq+(3*Detect Mult) inclusive (when treated as an > unsigned 32-bit circular number space) the received packet MUST be > discarded. > > That means that if a the packet 1-2-3-4 are delivered out of order as 1-4-2-3, > then packets 2 and 3 are going to be ignored and 1 loss will be detected. > That's probably not right. For our authentication purposes, without the presence of some sort of computed digest across the packet, NULL authentication means that an active attacker can knock the session over simply by injecting packets that push the sequence numbers ahead. Without the sequence number windowing check, it's trivial. With the sequence numbering window, it still becomes possible by simply hitting the right window and increases the window out of the range of the real client too fast. Trying to further mitigate that using a pacing check becomes problematic in most implementations and isn't natural for current BFD implementations. The conclusion was thus we can't use NULL for authentication. We're only interested in it for the stability check. However, as you note, in the event of regular packet reordering, we start bumping the counters too hard. What can you do about this? Nothing. We *could* mitigate active attempts to bump the counters, and perhaps that's reasonable to restore the check on the sequence windowing to try to remove some noise from the system. However, outside of that work... if you're having this issue occur with ANY authentication mechanism regularly, it means you have a problem. The best this mechanism can do is help provide you notice of issues. Outright dropped packets tells you there is loss. Regular out of order delivery is a notice of out of order. Past this, you can notify the client of issues and that's the entire value proposition here. > For packet loss detection, the state of the art is described in RFC 8985, the > RACK-TLP Loss Detection Algorithm for TCP, and in RFC 9002 which describes how > to use RACK-TLP for packet loss discovery in QUIC. I don't understand why the > BFD Stability draft does not reference the RACK-TLP algorithm and RFC8985. Minimally, it's the usual IETF delusion that everyone is aware of every possible technology that has been written to an Internet-Draft. :-) That is the point of the broader reviews. > The > "ad hoc" BFD loss detection algorithm description ends up repeating a lot of > points commonly discussed in the specification of RACK: detecting packet > losses > by monitoring holes in sequence numbers, the interaction between this > detection > and out of order delivery, or role and computation of timers. The BFD > stability > draft would probably be more robust if it merely adapted the algorithms > defined > in RFC8995. Valuable feedback. Sending this back as "see this RFC, see if you can benefit from it" is a reasonable request. > Copying algorithms from RFC8985 or RFC9002 would probably also diminish the > reliance on timers, allow use of adaptative timers instead of relying on > timers > set by management procedure, and allow reporting of RTT statistics in the Yang > module. Sorry, you don't get to tweak the timers. The core use of BFD remains. We're simply trying to piggyback loss detection on top of what we already have as an add-on. > The security considerations acknowledge that using NULL authentication will > allow attackers to mimic out of order delivery and thus cause spurious loss > detection. Indeed, this NULL Authentication procedure makes me cringe. Why are > we even defining that today, in 2024? Clearly you're unaware of the fact that trying to do expensive cryptography in low CPU power devices that have other things to do that compute digests is an attack. But that puts you in regular company for every security reviewer for BFD since its inception. The vast majority of BFD deployment remains without use of any authenciation whatsoever. The use of md5 and sha1 happens in environments where active attacks are of concern, but with significant negative impact to scale of the BFD sessions, their detection times, and scale of work the line cards end up doing. This draft highlights the fact that when you have missing packets detected by the sequence numbers, you can report it. If that's on one of the existing meticulous modes, your concern is addressed. If you want similar behavior for unauthenticated sessions, NULL gives you at least an option. It presents it with the same security guarantees you get with no authentication. Many people are fine with that. > Naive features like that can easily turn > into a CVE if the attackers used them as steps in an attack chain, causing > spurious error detection to convince routers that a path is not usable > anymore, > and thus maybe causing connectivity to a target to fail for the short time > necessary for an attack. It would be much safer to just require the MD5 or > SHA1 > variants, and not define NULL authentication at all. See above. > But MD5 and SHA1 authentication can also be attacked, because they do not > protect against replay attacks. If the sessions last long enough for the > sequence number to roll over, a patient attacker could record old packets and > replay them after the rollover. If the repeated packets pass the "range" test, > the bfd.RcvAuthSeq value will be modified, causing "good" packets below the > new > range to be ignored. Repeat that a couple of time, and the BFD process will be > effectively disabled. I think that the solution is to not allow rollover, or > maybe require use of new descriptors if too many packets have been exchanged. > In any case, the security section should discuss the issue. Not really. But the usual giant leap security directorate reviews get without having been through the prior RFC 5880 reviews: : bfd.LocalDiscr : : The local discriminator for this BFD session, used to uniquely : identify it. It MUST be unique across all BFD sessions on this : system, and nonzero. It SHOULD be set to a random (but still : unique) value to improve security. The value is otherwise outside : the scope of this specification. Thus, for authenticated systems using md5/sha1, the digest includes appropriate nonces. Given the rollover time, it's not a reasonable attack. If you want to provide a valid but silly attack, echoing a packet that was just sent two packets ago can produce such detection errors in authenticated systems. A general observation on security in BFD is that once you're to the point of actively trying to attack BFD, you're already in a place in the network that if your goal was to bring BFD down, you're better off just spoofing ARP or other things that are far easier attacks. > Of course, the security of MD5 and SHA1 is also suspect. There is no > indication > yet that SHA1 with the HMAC construct used in RFC5880 is broken, but the > general rule is no avoid SHA1 if we can -- and certainly avoid MD5. I don't > know the deployment requirements, but it would be nice if BFD allowed stronger > options. Cryptography at speed on small packets is an attack on the line cards. They have better things to do. This means we're not looking for perfect security, we're looking for appropriate speed bumps. That said, please also see: https://datatracker.ietf.org/doc/html/draft-ietf-bfd-optimizing-authentication-13 This work was chartered to allow for stronger authentication mechanisms such as SHA-2 to be used, but somewhat sparingly. > Surely, some deployments will want to use different authentication > methods in the future. Should there be some guidance on the development of > these future extensions, such as requiring that they provide a sequence number > similar to the "meticulous" variants? https://datatracker.ietf.org/doc/html/rfc7492, section 6: : The security risks brought by SHA-1 and MD5 have been well : understood. However, when using a stronger digest algorithm, e.g., : SHA-2, the imposed computing overhead will seriously affect the : performance of BFD implementation. It then goes on to suggest additional ciphers. Fundamentally, there's room for stronger ciphers, but the tradeoff will always apply. The optimizing procedures in the draft cited above lets us change the equation by making BFD require strong authentication less of the time. > At this point, let me indulge in a personal aparte. Why do we even have a > specific BFD protocol? It seems we're back to the need to have the "why bfd 101" for IETF again. You speculate, but don't quite hit the mark. Fundamentally, common routing protocol and other control plane and data plane requirements need a simple bidirectional connectivity check. IETF protocols would embed flavors of this in each of the protocols, but often with seconds-plus granularity timers to address the scaling capabilities of the control plane software. Rather than try to make every IETF control plane protocol provide faster timers, BFD provides generic plumbing that can be used by various clients to provide that check. Similarly, dataplane failure mechanisms such as fast reroute can leverage BFD as a triggering mechanism. It's a dull protocol, but it's been quite successful. > rollover, more precisely handle out of order packets, maybe add timestamps. As We've explicitly put timestamp work outside the scope of BFD. Over the years we've considered adding new things into BFD, but each of those new things often degraded the core use case. Or, as my usual challenge to new proposals usually was: What is so interesting you want to do it every 10ms? Instead, we've encouraged other applications that are interested in leveraging the session setup semantics to adopt the state machinery if they find it interesting. But please don't try to stuff it into BFD. > BFD is used to monitor data plane quality, using the same transport protocol > as > applications would make sense. As our experts in OAM, like Greg Mirsky, would remind us: We're only using BFD for connectivity checks. Other semantics for outright quality are mostly out of scope because it requires additional machinery. This draft doesn't try to address quality. It simply propagates up something the protocol is already capable of noting: expected stuff is lost. > These functions are all well established in > transport protocols. I understand that TCP+TLS would not be practical, but we > are not in 2010 anymore and we have more options. For example, it would be > trivial to define "BFD over QUIC", while providing modern security, congestion > control, timer management and loss detection. A BSD over QUIC application > could > update the Yang module just like BFD does, while providing much better > security! While I'm working on other work targeting QUIC, I think targeting BFD for it would be ludicrous. That said, let's focus on the more reasonable portions of your review. -- Jeff