On Jun 8, 2024, at 12:54 AM, Christian Huitema via Datatracker
<nore...@ietf.org> wrote:
Reviewer: Christian Huitema
Review result: Has Issues
I have reviewed this document as part of the security directorate's ongoing
effort to review all IETF documents being processed by the IESG. These comments
were written primarily for the benefit of the security area directors. Document
editors and WG chairs should treat these comments just like any other last call
comments.
Bidirectional Forwarding Detection (BFD, RFC 5880) is used to monitor "data
plane" paths by sending BFD packets between "forwarding engines" and detecting
failures when they are not correctly received. For example, its Echo function
will trigger ping-like replies to check that a path is valid in both
directions. The BFD Stability draft describes extensions to BFD to not just
detecting failures when a sufficient number of BFD packets or echoes are not
received, but to also quickly detect and report packet losses. When losses are
detected, they are reported in the Yang module ietf-bfd-stability, which is
defined in the draft.
There is no packet sequence number in the "Generic BFD Control Packet Format"
defined in section 4.1 of RFC5880. The BFD stability draft uses instead the
Sequence Number defined in the authentication extension to BFD. RFC5880 defines
five types of authentication: simple password, keyed MD5, "meticulously" keyed
MD5, keyed SHA1, and "meticulously" keyed SHA1. Of those, only the MD5 and SHA1
variants include a 32 bit sequence number, and only the "meticulous" variants
require that the sequence number should be incremented for each packet. The
draft requests that if authentication is used, then either the MD5 or SHA1
"meticulous" variant be selected. The draft also defines a NULL form of
authentication, in which the authentication extension only carries a sequence
number. The loss detection can then use the sequence number in the
authentication extension.
The authentication sequence number is a 32 bit field. Such numbers can roll
over, either after a long duration session or due to a packet injection attack.
There is some text about that in the description of the NULL authentication. It
says:
If bfd.AuthSeqKnown is 1, and the received Sequence Number field is
not equal to bfd.RcvAuthSeq + 1 (in a circular number space), then
the loss count is incremented by one and bfd.RcvAuthSeq is set to the
received Sequence Number.
That does not look quite right. Suppose that due to out of order delivery, the
packets are received in order 1-3-2-4. Upon reception of packet 3, the
algorithm counts one loss and set the next expected value to 4. After packet 2,
another loss and expected value to 3. After packet 4, another loss and expected
value to 5. So, three losses when none actually occurred.
In RFC 5880, the specification of Meticulous Keyed MD5 addresses both number
rollover and out of order delivery. The same text is repeated for meticulous
MD5 and meticulous SHA1:
... if the
sequence number lies outside of the range of bfd.RcvAuthSeq+1 to
bfd.RcvAuthSeq+(3*Detect Mult) inclusive (when treated as an
unsigned 32-bit circular number space) the received packet MUST be
discarded.
That means that if a the packet 1-2-3-4 are delivered out of order as 1-4-2-3,
then packets 2 and 3 are going to be ignored and 1 loss will be detected.
That's probably not right.
For packet loss detection, the state of the art is described in RFC 8985, the
RACK-TLP Loss Detection Algorithm for TCP, and in RFC 9002 which describes how
to use RACK-TLP for packet loss discovery in QUIC. I don't understand why the
BFD Stability draft does not reference the RACK-TLP algorithm and RFC8985. The
"ad hoc" BFD loss detection algorithm description ends up repeating a lot of
points commonly discussed in the specification of RACK: detecting packet losses
by monitoring holes in sequence numbers, the interaction between this detection
and out of order delivery, or role and computation of timers. The BFD stability
draft would probably be more robust if it merely adapted the algorithms defined
in RFC8995.
Copying algorithms from RFC8985 or RFC9002 would probably also diminish the
reliance on timers, allow use of adaptative timers instead of relying on timers
set by management procedure, and allow reporting of RTT statistics in the Yang
module.
The security considerations acknowledge that using NULL authentication will
allow attackers to mimic out of order delivery and thus cause spurious loss
detection. Indeed, this NULL Authentication procedure makes me cringe. Why are
we even defining that today, in 2024? Naive features like that can easily turn
into a CVE if the attackers used them as steps in an attack chain, causing
spurious error detection to convince routers that a path is not usable anymore,
and thus maybe causing connectivity to a target to fail for the short time
necessary for an attack. It would be much safer to just require the MD5 or SHA1
variants, and not define NULL authentication at all.
But MD5 and SHA1 authentication can also be attacked, because they do not
protect against replay attacks. If the sessions last long enough for the
sequence number to roll over, a patient attacker could record old packets and
replay them after the rollover. If the repeated packets pass the "range" test,
the bfd.RcvAuthSeq value will be modified, causing "good" packets below the new
range to be ignored. Repeat that a couple of time, and the BFD process will be
effectively disabled. I think that the solution is to not allow rollover, or
maybe require use of new descriptors if too many packets have been exchanged.
In any case, the security section should discuss the issue.
Of course, the security of MD5 and SHA1 is also suspect. There is no indication
yet that SHA1 with the HMAC construct used in RFC5880 is broken, but the
general rule is no avoid SHA1 if we can -- and certainly avoid MD5. I don't
know the deployment requirements, but it would be nice if BFD allowed stronger
options. Surely, some deployments will want to use different authentication
methods in the future. Should there be some guidance on the development of
these future extensions, such as requiring that they provide a sequence number
similar to the "meticulous" variants?
At this point, let me indulge in a personal aparte. Why do we even have a
specific BFD protocol? I suppose that it started with the idea of using
something light weight but better than old fashioned ping. But if I read the
various RFCs, I see the definition of session setup, the requirement for
congestion control, the definition of timers, and now the addition of packet
loss detection. There may be more coming if we want to address sequence number
rollover, more precisely handle out of order packets, maybe add timestamps. As
BFD is used to monitor data plane quality, using the same transport protocol as
applications would make sense. These functions are all well established in
transport protocols. I understand that TCP+TLS would not be practical, but we
are not in 2010 anymore and we have more options. For example, it would be
trivial to define "BFD over QUIC", while providing modern security, congestion
control, timer management and loss detection. A BSD over QUIC application could
update the Yang module just like BFD does, while providing much better security!