Thanks for the comments, Alan.

I understand that the main use of BFD is between directly connected nodes, but then RFC 5880 says:

   BFD can provide failure detection on any kind of path between
   systems, including direct physical links, virtual circuits, tunnels,
   MPLS Label Switched Paths (LSPs), multihop routed paths, and
   unidirectional links (so long as there is some return path, of
   course).

In my experience, the main causes for out of order delivery are link layer retransmission of some packets, which is frequent in Wi-Fi, and promotion of short packets to the head of the queue, which does happens with some models of home or small business routers -- it was very much happening during IETF 103 in Bangkok. Other reasons may include ECMP and transient routing issues. I would expect it to me noticeable in wireless links, tunnels, and multihop routed paths.

I agree that in normal conditions it would take a long time for the 32 bit counter to roll over. My concerned was mostly forced rollover due to injection of packets without authentication, or replay of old authenticated packets. But then, if it almost never happens it means it is not tested, and it is best to enforce that it does not happen.

-- Christian Huitema

On 6/8/2024 5:40 AM, Alan DeKok wrote:
   (removing secdir)

   The security analysis is perhaps simplified a bit by understanding the 
limited use-case for BFD.  From the introduction to RFC 5880:

    The goal of Bidirectional Forwarding Detection (BFD) is to provide
    low-overhead, short-duration detection of failures in the path
    between adjacent forwarding engines, including the interfaces, data
    link(s) ...

   i.e. The parties to a BFD conversation are intended to be connected by a wire.  This 
means that issues like "out of order packet delivery" are generally not 
applicable.

   The attacks on BFD are therefore limited to either (a) forcibly taking the 
connection down, or (b) erroneously keeping the connection up.

   No amount of cryptography will protect from an attack which intends to take the BFD 
session down.  An on-path attacker simply has to either drop the BFD packets, or mangle 
them so that they are discarded by the receiver.  As such, any cryptography in BFD is 
limited to preventing spoofed packets.  The "Secure Sequence numbers" draft is 
intended to do exactly that.  It uses a CSPRNG to generate infinite sequences of random 
numbers which an attacker cannot predict.  Receivers can verify that packets contain the 
expected sequences, within a small window.   
https://datatracker.ietf.org/doc/draft-ietf-bfd-secure-sequence-numbers/

   For sequence number roll-over issues with the other "authentication" types, 
the default timers (1 packet/s), result in a rollover every 136 years, which seems 
acceptable.  Typical BFD timer intervals are 300ms, which leads to 45 years between 
rollover.  Switching from MD5/SHA1 to the above draft would also address the roll-over 
issue.

   The parties to the BFD conversation are typically also line cards with 
extremely limited CPU power.  So independent of the issues noted above that 
crypto doesn't help with certain kinds of attacks, the systems generally don't 
have enough CPU power to run a full cryptographic protocol like TLS or QUIC.

   That being said, the Security Considerations section should discuss the 
items mentioned in your review, and explain why the issues noted are generally 
acceptable for BFD.

On Jun 8, 2024, at 12:54 AM, Christian Huitema via Datatracker 
<nore...@ietf.org> wrote:

Reviewer: Christian Huitema
Review result: Has Issues

I have reviewed this document as part of the security directorate's ongoing
effort to review all IETF documents being processed by the IESG. These comments
were written primarily for the benefit of the security area directors. Document
editors and WG chairs should treat these comments just like any other last call
comments.

Bidirectional Forwarding Detection (BFD, RFC 5880) is used to monitor "data
plane" paths by sending BFD packets between "forwarding engines" and detecting
failures when they are not correctly received. For example, its Echo function
will trigger ping-like replies to check that a path is valid in both
directions. The BFD Stability draft describes extensions to BFD to not just
detecting failures when a sufficient number of BFD packets or echoes are not
received, but to also quickly detect and report packet losses. When losses are
detected, they are reported in the Yang module ietf-bfd-stability, which is
defined in the draft.

There is no packet sequence number in the "Generic BFD Control Packet Format"
defined in section 4.1 of RFC5880. The BFD stability draft uses instead the
Sequence Number defined in the authentication extension to BFD. RFC5880 defines
five types of authentication: simple password, keyed MD5, "meticulously" keyed
MD5, keyed SHA1, and "meticulously" keyed SHA1. Of those, only the MD5 and SHA1
variants include a 32 bit sequence number, and only the "meticulous" variants
require that the sequence number should be incremented for each packet. The
draft requests that if authentication is used, then either the MD5 or SHA1
"meticulous" variant be selected. The draft also defines a NULL form of
authentication, in which the authentication extension only carries a sequence
number. The loss detection can then use the sequence number in the
authentication extension.

The authentication sequence number is a 32 bit field. Such numbers can roll
over, either after a long duration session or due to a packet injection attack.
There is some text about that in the description of the NULL authentication. It
says:

   If bfd.AuthSeqKnown is 1, and the received Sequence Number field is
   not equal to bfd.RcvAuthSeq + 1 (in a circular number space), then
   the loss count is incremented by one and bfd.RcvAuthSeq is set to the
   received Sequence Number.

That does not look quite right. Suppose that due to out of order delivery, the
packets are received in order 1-3-2-4. Upon reception of packet 3, the
algorithm counts one loss and set the next expected value to 4. After packet 2,
another loss and expected value to 3. After packet 4, another loss and expected
value to 5. So, three losses when none actually occurred.

In RFC 5880, the specification of Meticulous Keyed MD5 addresses both number
rollover and out of order delivery. The same text is repeated for meticulous
MD5 and meticulous SHA1:

      ... if the
      sequence number lies outside of the range of bfd.RcvAuthSeq+1 to
      bfd.RcvAuthSeq+(3*Detect Mult) inclusive (when treated as an
      unsigned 32-bit circular number space) the received packet MUST be
      discarded.

That means that if a the packet 1-2-3-4 are delivered out of order as 1-4-2-3,
then packets 2 and 3 are going to be ignored and 1 loss will be detected.
That's probably not right.

For packet loss detection, the state of the art is described in RFC 8985, the
RACK-TLP Loss Detection Algorithm for TCP, and in RFC 9002 which describes how
to use RACK-TLP for packet loss discovery in QUIC. I don't understand why the
BFD Stability draft does not reference the RACK-TLP algorithm and RFC8985. The
"ad hoc" BFD loss detection algorithm description ends up repeating a lot of
points commonly discussed in the specification of RACK: detecting packet losses
by monitoring holes in sequence numbers, the interaction between this detection
and out of order delivery, or role and computation of timers. The BFD stability
draft would probably be more robust if it merely adapted the algorithms defined
in RFC8995.

Copying algorithms from RFC8985 or RFC9002 would probably also diminish the
reliance on timers, allow use of adaptative timers instead of relying on timers
set by management procedure, and allow reporting of RTT statistics in the Yang
module.

The security considerations acknowledge that using NULL authentication will
allow attackers to mimic out of order delivery and thus cause spurious loss
detection. Indeed, this NULL Authentication procedure makes me cringe. Why are
we even defining that today, in 2024? Naive features like that can easily turn
into a CVE if the attackers used them as steps in an attack chain, causing
spurious error detection to convince routers that a path is not usable anymore,
and thus maybe causing connectivity to  a target to fail for the short time
necessary for an attack. It would be much safer to just require the MD5 or SHA1
variants, and not define NULL authentication at all.

But MD5 and SHA1 authentication can also be attacked, because they do not
protect against replay attacks. If the sessions last long enough for the
sequence number to roll over, a patient attacker could record old packets and
replay them after the rollover. If the repeated packets pass the "range" test,
the bfd.RcvAuthSeq value will be modified, causing "good" packets below the new
range to be ignored. Repeat that a couple of time, and the BFD process will be
effectively disabled. I think that the solution is to not allow rollover, or
maybe require use of new descriptors if too many packets have been exchanged.
In any case, the security section should discuss the issue.

Of course, the security of MD5 and SHA1 is also suspect. There is no indication
yet that SHA1 with the HMAC construct used in RFC5880 is broken, but the
general rule is no avoid SHA1 if we can -- and certainly avoid MD5. I don't
know the deployment requirements, but it would be nice if BFD allowed stronger
options. Surely, some deployments will want to use different authentication
methods in the future. Should there be some guidance on the development of
these future extensions, such as requiring that they provide a sequence number
similar to the "meticulous" variants?

At this point, let me indulge in a personal aparte. Why do we even have a
specific BFD protocol? I suppose that it started with the idea of using
something light weight but better than old fashioned ping. But if I read the
various RFCs, I see the definition of session setup, the requirement for
congestion control, the definition of timers, and now the addition of packet
loss detection. There may be more coming if we want to address sequence number
rollover, more precisely handle out of order packets, maybe add timestamps. As
BFD is used to monitor data plane quality, using the same transport protocol as
applications would make sense. These functions are all well established in
transport protocols. I understand that TCP+TLS would not be practical, but we
are not in 2010 anymore and we have more options. For example, it would be
trivial to define "BFD over QUIC", while providing modern security, congestion
control, timer management and loss detection. A BSD over QUIC application could
update the Yang module just like BFD does, while providing much better security!





Reply via email to