Re: AD Evaluation Review of draft-ietf-bfd-stability-18

Ashesh Mishra Sun, 08 Jun 2025 22:46:46 -0700

Hi Ketan, A little late to the game but perhaps I can reinforce and add
color to some of the responses already provided by Jeff and Mahesh.


   -



*<major> The title/name of "BFD Stability" is misleading to me. It gives an
   impression of how stable is the BFD session, as in - is it flapping a lot
   or is staying up and stable for a long interval? Why not call this BFD
   Packet Loss Monitoring ... or something like that which is a simple term
   and yet perhaps gives the true picture of what this feature is about?*
      - In the most egregious form of instability (the kind that the base
      5880 is designed for) the session is brought down after missing
BFD packets
      for an interval of Detection Time. This one is obvious and has
very visible
      impact on the network being monitored by BFD session. In my experience,
      there are more transient or benign forms of instability, where
the duration
      of missed packets is shorter than Detection Time, that evade
diagnosis but
      are nevertheless valuable indicators of systemic issues or a
deteriorating
      network that may warrant preventative action. This draft specifically
      addresses this kind of non-critical instability ... and hence the title
      BFD-Stability (since it measures the degree of stability of a
BFD session).
      - The primary intent of the measurement is not to detect BFD packet
      loss, although that is the means used for the process. The goal is to
      determine whether a BFD session itself is degraded. Packet loss
measurement
      is simply a relatively non-intrusive way of making that
determination while
      avoiding addition of instability in the system.
   -


*<major> Packets may be reordered and arrive with different delays. Let us
   say that the packet that was supposed to arrive in interval I were delayed
   to arrive in interval I+1. i.e., we get one extra packet in the interval
   I+1. This does not indicate a packet loss in interval I, but the procedure
   above seems to log it as a packet loss?*
      - As Jeff mentioned in his response, the procedure assumes there is
      no reordering. An implementation may adapt the method to allow for
      reordering but it requires a more sophisticated implementation
to track not
      only the sequence numbers but also the timestamps when they arrived. The
      original draft proposed addressed a richer breadth of
measurements but was
      too complex to implement without making more dramatic changes to the BFD
      payload and the implementations. It also suffered from the added risk of
      overwhelming the CPU and causing additional instability (it caused the
      implementation to become a manifestation of the uncertainty principle).
   -




*<major> Is the loss counter reset when the BFD session goes down? Is there
   a notion of time period that is tracked/reported here? Is there a notion of
   a percentage of BFD packets lost that is being reported? How useful is it
   to simply report the lost packet count without any of these other contexts?
   Looking at the model, the history of this data for the previous uptime is
   also not being tracked. Have these aspects been considered by the WG?*
      - *Is the loss counter reset when a BFD session goes down?* There is
      no inherent value in resetting the counter. There is, however,
value in the
      analyzing the data collected from polling this counter at various
      intervals. But that is front-end work on the underlying engine, and hence
      not prescribed here. For reference, I found value in maintaining buckets
      for each window when a session was UP and logged the sequence number of
      each lost packet. I'm sure there are other ways of processing
the data and
      the choice of how to track the information efficiently is ultimately left
      to each implementation. This also allows each implementation with the
      opportunity to maintain consistency with the broader monitoring and
      telemetry framework for that system.
      - *Is there a notion of a percentage of BFD packets lost that is
      being reported? *This is also a front-end task. There are a number of
      ways of inferring the data collected by the engine, and using
that data for
      arriving at different conclusions.
      - *How useful is it to simply report the lost packet count without
      any of these other contexts? *Depends on the quality of network.
      There are networks where you hardly ever lose a single frame - so a lost
      packet count increment is highly irregular, and hence a valuable metric
      (think of it as less capable but also less intrusive than running proper
      loss measurement protocols). Other networks are quite lossy (satellite
      networks and other radio networks, for instance) so the total
count of lost
      BFD packets is less relevant than the trend of the losses.
      - *Looking at the model, the history of this data for the previous
      uptime is also not being tracked.* This is another one in the
      front-end bucket. An implementation may choose to track it but
that has no
      impact on the underlying engine being used for the measurements.

Regards,
Ash

On Thu, May 15, 2025 at 4:05 AM Ketan Talaulikar <[email protected]>
wrote:

> Hello Authors/WG,
>
> Thanks for the work put into this document. It has been in the works for a
> long time in an on/off mode. There is some more work needed before it can
> be taken up for IESG evaluation.
>
> I would like to share my review of the v18 of this document.
>
> General Comment/Suggestion:
> This is about the contents of this document and its relationship with
> draft-ietf-bfd-optimizing-authentication and
> draft-ietf-bfd-secure-sequence-numbers. I believe this document does not
> depend on those other two, at least not normatively as indicated today.
> This proposal is self sufficient with the new null auth type and the two
> existing BFD auth types that use meticulous incrementing sequence numbers.
> As such, for smooth progression of this work, I would strongly recommend
> removing all references to those drafts and the ISAAC-based auth types or
> the Optimized Auth from this document. The
> draft-ietf-bfd-secure-sequence-numbers that actually specifies the two
> ISAAC-based auth types can instead refer to the draft-ietf-bfd-stability to
> indicate that those new auth types are suitable for use for measuring BFD
> packet loss. This way, this document becomes independent of the other two
> for its further processing.
>
> Please find below my comments in the idnits output of v18 and look for
> <EoRv18> at the very end of the review. If you don't see that, then likely
> the email has been truncated by your email client and you should look at
> the BFD WG email archive for the full version.
>
> Thanks,
> Ketan
>
>
> 14                             BFD Stability
> 15                      draft-ietf-bfd-stability-18
>
> 17 Abstract
>
> 19   This document describes extensions to the Bidirectional Forwarding
> 20   Detection (BFD) protocol to measure BFD stability.  Specifically, it
> 21   describes a mechanism for detection of BFD packet loss.
>
> <major> The title/name of "BFD Stability" is misleading to me. It gives an
> impression of how stable is the BFD session, as in - is it flapping a lot
> or is
> staying up and stable for a long interval? Why not call this BFD Packet
> Loss
> Monitoring ... or something like that which is a simple term and yet
> perhaps
> gives the true picture of what this feature is about?
>
> 98   This document does not propose any BFD extension to measure data
> 99   traffic loss or delay on a link or tunnel and the scope is limited to
> 100   BFD packets.
>
> <major> Please provide some text for justification for the experimental
> status - something on similar lines as the other two documents will work
> just as well.
>
> 120   The reader is expected to be familiar with the BFD [RFC5880],
> 121   Optimizing BFD Authentication
> 122   [I-D.ietf-bfd-optimizing-authentication] and Meticulous Keyed ISAAC
> 123   for BFD Authentication [I-D.ietf-bfd-secure-sequence-numbers].
>
> <major> I see no reason for the above two references or dependencies in
> this
> document. They seem unnecessary to me. What is the normative (must have)
> dependency that I am missing? And why is even an informative reference
> really
> necessary?
>
> 139   In a faulty datapath scenario, an operator can use BFD health
> 140   information to trigger delay and loss measurement OAM protocol
> 141   (Connectivity Fault Management (CFM) or Loss Measurement (LM)-Delay
> 142   Measurement (DM)) to further isolate the issue.
>
> <minor> Please provide informative references for the CFM and DM
> technologies
>
> 150 5.  NULL Auth Type
>
> <question> Why is a null auth type, or even a sequence number necessary
> for BFD
> packet loss calculation? Is it not OK to expect that the other endpoint is
> going to send X number of packets every interval? And if we don't get
> those X
> packets at every interval, then we have a packet loss? Perhaps I am missing
> something obvious and if so, it would be good to capture the rationale that
> really needs these sequence numbers for this measurement.
>
> 179   Auth Key ID: The authentication key ID in use for this packet.  Must
> 180   be set to zero and ignored on receipt.
>
> <minor> s/must/MUST
>
> 216 6.1.  Loss Measurement
>
> 218   Loss measurement counts the number of BFD control packets missed at
> 219   the receiver during any Detection Time period.  The loss is detected
> 220   by comparing the Sequence Number field in successive BFD control
> 221   packets.  The Sequence Number in each successive control packet
> 222   generated on a BFD session by the transmitter is incremented by one.
> 223   This loss count can then be exposed using the YANG module defined in
> 224   the subsequent section.
>
> <major> Packets may be reordered and arrive with different delays. Let us
> say that the
> packet that was supposed to arrive in interval I were delayed to arrive in
> interval
> I+1. i.e., we get one extra packet in the interval I+1. This does not
> indicate
> a packet loss in interval I, but the procedure above seems to log it as a
> packet loss?
>
> 226   The first BFD authentication section with a non-zero sequence number,
> 227   in a valid BFD control packet, processed by the receiver is used for
> 228   bootstrapping the logic.
>
> <major> Is the loss counter reset when the BFD session goes down? Is there
> a
> notion of time period that is tracked/reported here? Is there a notion of a
> percentage of BFD packets lost that is being reported? How useful is it to
> simply report the lost packet count without any of these other contexts?
> Looking at the model, the history of this data for the previous uptime is
> also
> not being tracked. Have these aspects been considered by the WG?
>
> 239   Implementations MAY provide mechanisms wherein all expected packets
> 240   received across an expected interval but delivered out of order are
> 241   not considered lost packets.
>
> <major> Why is this not a MUST? How is it ok to do incorrect and inaccurate
> reporting of BFD packet loss? Please see my previous comment.
>
> 243 7.  Stability YANG Module
>
> <question> I am not an IETF YANG expert. I would like to check if there are
> any issues with an experimental RFC augmenting a standards track YANG
> model.
>
> 599 9.  Security Consideration
>
> 601 9.1.  YANG Security Considerations
>
> <minor> Please reorder the sections. I know some of the authors are YANG
> champs, but let us not put the cart before the horse :-)
>
> 626   addition, and as stated in Out of Order Packets (Section 6.2), on
> 627   links such as LAG or ECMP, there is a possibility of packets being
> 628   delivered out of order.  A strict comparison of increasing sequence
> 629   numbers may result in classifying those out of order packets as
> 630   packet loss.
>
> <minor> Does this text blob not belong to the Null Auth or a separate BFD
> Packet loss monitoring sub-section?
>
> 652   When the NULL Authentication type is used for BFD Stability purposes,
> 653   maliciously injected packets that do not reset the BFD session can
> 654   resemble high packet loss.  Sessions such as, multi-hop routed paths,
> 655   tunnels without authentication, or MPLS LSP, therefore, have security
> 656   guarantees that are identical to situations where BFD is run without
> 657   authentication.
>
> <minor> How about someone could manipulate the sequence numbers and give a
> wrong idea of packet loss? Possibly raise false alarms?
>
> <EoRv18>
>
>

Re: AD Evaluation Review of draft-ietf-bfd-stability-18

Reply via email to