Thank you for the detailed review
I uploaded version 14 of the draft.
See #Ahmed for response to the comments
Ahmed
On 4/17/24 5:04 AM, Gunter Van de Velde via Datatracker wrote:
Gunter Van de Velde has entered the following ballot position for
draft-ietf-rtgwg-segment-routing-ti-lfa-13: Discuss
When responding, please keep the subject line intact and reply to all
email addresses included in the To and CC lines. (Feel free to cut this
introductory paragraph, however.)
Please refer tohttps://www.ietf.org/about/groups/iesg/statements/handling-ballot-positions/
for more information about how to handle DISCUSS and COMMENT positions.
The document, along with other ballot positions, can be found here:
https://datatracker.ietf.org/doc/draft-ietf-rtgwg-segment-routing-ti-lfa/
----------------------------------------------------------------------
DISCUSS:
----------------------------------------------------------------------
# Gunter Van de Velde, RTG AD, comments for
draft-ietf-rtgwg-segment-routing-ti-lfa-13
Please find below two blocking DISCUSS points (easy to address), and a series of
non-blocking COMMENTs and some nits.
Many thanks for the RTGDIR reviews from Stewart Bryant,
Andy Smith and Ben Niven-Jenkins during the 7 years development
period of the TI-LFA specification. Also many thanks for the shepherd
write-up by Steward Bryant to provide a brief overview of the
progress of the draft through the WG and the current state of art.
Thank you to the authors of this document. I really appreciate the
effort and believe it captures the TI-LFA normative procedures well.
Reviewing it with fresh eyes, I've made several comments that could
help further improve the quality. I hope these insights will be
valuable for the authors and the Working Group as you continue
to refine the document.
DISCUSS:
========
DISCUSS#1
In section '9. TI-LFA and SR algorithms' i found the text written from sr-mpls
perspective. SRv6 has different considerations.
637 and Q-Space as well as the post-convergence path. An implementation
638 MUST only use Node-SIDs bound to the FlexAlgo and/or Adj-SIDs that
639 are unprotected to build the repair list.
The above seems written from an sr-mpls perspective. For SRv6 the Adj-SID is
bound
to a Locator and consequently bound to an algorithm. As result, the observed
limitation
of sr-mpls does not really apply for SRv6. For SRv6 an implementation can
use protected Adj-SID in the repair path without breaking algorithm aware
topology requirements. Consider allowing protected SRv6 Adj-SIDs for TI-LFA.
#Ahmed: version 14 modified the last sentence to indicate that SRv6
adj-SIDs can be used
In addition consider some blob of text about Adj-SIDs and locators in
"section 8.2. SRv6 dataplane considerations" could be beneficial.
With sr-mpls there is no correlation to the segment routing algorithm, however
when using SRv6 dataplane Adj-SID Locator is correlated to an algorithm.
#Ahmed: Section 8.2 refers to [RFC8754] and [RFC8986] that detail SRV6.
IMO any additional text explaining SRv6 dataplane will be redundant and
may cause more confusion. At the same time the reader is referred to
documents that provide all details about SRv6
DISCUSS#2
Sections 11 and 12 do not introduce any supplementary artifacts to the normative
procedures outlined for TI-LFA. The information within section11 and 12 is
provided
in extensive detail. Should the Working Group (WG) prefer to maintain this
level of specificity, it is advisable to consider relocating the detailed
content to an appendix unless there is a strong reason to keep it in the main
body of the document.
#Ahmed: moved to Appendix A and B
----------------------------------------------------------------------
COMMENT:
----------------------------------------------------------------------
High level comments:
====================
* TI-LFA is based upon Segment Routing, however the document seems to have
mostly sr-mpls datapane type language. The SRv6 dataplane is only mentioned
first time on line 493, almost half way through the document. Maybe consider
mentioning support for SRv6 dataplane earlier onwards.
#Ahmed: From the point of view of the scope of this document, there is a
small difference between SR-MPLS and SRv6 (some of which you pointed out
(thanks a lot)). That is why none of them was explicitly mentioned early
on. At the same time, they were both mentioned in the same sentence. If
I were to explicitly mention SRv6 early on, then I have to do the same
for SR-MPLS
* 6 people on front
page. Did all authors edit text in the draft?
#Ahmed: All authors had significant contribution to this draft. It will
not be doing justice to drop any of them
* Operational impact may want to
explicit mention that there is no interop complexity because TI-LFA is a node
local operation * the document makes use of the term 'we' and other
anthropomorphism. Maybe not the best approach in a formal document. Who is
'we'? editor, authors, WG, IETF community, operators, etc? policies have no
awareness or emotions
Detailed review COMMENTS ([minor] and [major])
==============================================
(Line numbers are rendered using idnits rendering)
19 This document presents Topology Independent Loop-free Alternate Fast
20 Re-route (TI-LFA), aimed at providing protection of node and
21 adjacency segments within the Segment Routing (SR) framework. This
[minor]
s/Re-route/Reroute/
#Fixed
[major]
The description provide insight that TI-LFA provide protection of node and adj
segments. It does not specify what 'protection' is all about or that
'protection' is constrained to single link|node failures. i.e. rfc5286 has
explicit text in the abstract about single failure applicability.
24 (DLFA). It extends these concepts to provide guaranteed coverage in
25 any two connected networks using a link-state IGP. A key aspect of
#Ahmed: The abstract is too short to provide more details. The specific
protection description is provided in the paragraph starting with "For
each Destination" in Page 5
[major]
in this sentence 'two connected networks' is referenced, while earlier in the
paragraph there is indication of 'protection of node and adjacency segments'.
How doe two connected networks correlate with the segments?
#Ahmed: A 2-connected network is a network that does not become
partitioned as a result of a single failure. The concepts of segment is
detailed in the references. I am not really sure if I understand your
concern
25 any two connected networks using a link-state IGP. A key aspect of
26 TI-LFA is the FRR path selection approach establishing protection
27 over the expected post-convergence paths from the point of local
28 repair, reducing the operational need to control the tie-breaks among
29 various FRR options.
[minor]
suggested rewrite to make the text better readable:
A principal attribute of TI-LFA is the FRR path selection methodology, which
establishes protection over the anticipated post-convergence paths from the
point of local repair. This approach diminishes the operational necessity
to manage the tie-breaks among various FRR alternatives.
#Ahmed: IMO the text is clear.
[minor]
why is the path selection better? can a hint be given why it is better
beyond a statement proclaiming it is better?
#Ahmed: Second paragraph in Appendix A (which used to be section 10 in
version 13 and moved to Appendix A based on your advice) in version 14
of the draft explains why it is better
138 * TI-LFA: Topology Independant LFA.
[minor]
s/Independant/Independent/
#Ahmed: Fixed
144 Segment Routing aims at supporting services with tight SLA guarantees
145 [RFC8402]. By relying on SR this document provides a local repair
[major]
The term SLA does not appear even once in RFC8402. How can the claim of
tight SLA be justified with RFC8402? can an better pointer to the claim be
inserted?
#Ahmed: I removed the sentence
[minor]
s/Segment Routing/Segment Routing (SR)/
145 [RFC8402]. By relying on SR this document provides a local repair
146 mechanism for standard link-state IGP shortest path capable of
147 restoring end-to-end connectivity in the case of a sudden directly
148 connected failure of a network component. Non-SR mechanisms for
[minor]
readability rewrite:
This document outlines a local repair mechanism that leverages Segment
Routing (SR) to restore end-to-end connectivity in the event of an
abrupt failure involving a directly connected network component.
This mechanism is designed for standard link-state Interior Gateway
Protocol (IGP) shortest path scenarios.
#Ahmed: thanks for the text suggestion. I replaced the original text
with that suggestion
153 The term topology independent (TI) refers to the ability to provide a
154 loop free backup path irrespective of the topologies used in the
155 network. This provides a major improvement compared to LFA [RFC5286]
156 and remote LFA [RFC7490] which cannot provide a complete protection
157 coverage in some topologies as described in [RFC6571].
[minor]
I think what is been trying to say is:
The term topology independent (TI) describes the capability of
providing a loop-free backup path that is effective across all network
topologies. This represents a significant enhancement over Loop-Free
Alternate (LFA) [RFC5286] and Remote LFA as outlined in
[RFC7490], both of which do not offer comprehensive protection coverage
in certain topological configurations as detailed in [RFC6571]. TI-LFA
ensures the availability of a backup path if a post-convergence path
exists, regardless of the network topology.
#Ahmed: Thanks again for the text suggestion. I replaced the original
text with that suggestion
167 TI-LFA is a local operation applied by the PLR when it detects
168 failure of one of its local links. As such, it does not affect:
[minor]
It would be welcome to explicit spell that TI-LFA is protection against
a single local link failure
#Ahmed: The paragraph starting with "For each destination" in Page 5
mentions that
[minor]
It was mentioned that TI-LFA provide protection against link and node failure.
In this section the abrupt fail of a link is mentioned to trigger FRR. How is
node-protection with TI-LFA achieved and the PLR triggered that neighboring
node is no more operational? It is elaborated upon later in this
section, but maybe a brief hint could be provided here too?
#Ahmed: As you mentioned, it is already provided. IMO (and probably the
opinion of others) it will be redundant to re-provide description here.
167 TI-LFA is a local operation applied by the PLR when it detects
168 failure of one of its local links. As such, it does not affect:
170 * Micro-loops that appear - or do not appear – as part of the
171 distributed IGP convergence [RFC5715] on the paths to the
172 destination that do not pass thru TI-LFA paths:
174 - As explained in [RFC5714], such micro-loops may result in the
175 traffic not reaching the PLR and therefore not following TI-LFA
176 paths.
178 * Micro-loops that appear – or do not appear - when the failed link
179 is repaired.
[minor]
This does not process very well. I tried reading a few times this paragraph
and believe what is mentioned could be rewritten as follows:
"TI-LFA operates locally at the Point of Local Repair (PLR) upon detecting
a failure in one of its direct links. Consequently, this local operation
does not influence:
* Micro-loops that may or may not form during the distributed Interior
Gateway Protocol (IGP) convergence as delineated in RFC 5715.
- These micro-loops occur on routes directed towards the destination that
do not traverse TI-LFA-configured paths. According to [RFC5714], the formation
of such micro-loops can prevent traffic from reaching the PLR, thereby
bypassing the TI-LFA paths established for rerouting.
* Micro-loops that may or may not develop when the previously failed link
is restored to functionality.
#Ahmed: thanks again for the text. I replaced existing text with the
suggested one
This specification highlights that while TI-LFA effectively addresses specific
link failures, it does not extend its impact to managing micro-loops
associated with broader IGP convergence issues or subsequent link repairs."
181 TI-LFA paths are loop-free. What’s more, they follow the post-
182 convergence paths, and, therefore, not subject to micro-loops due to
183 difference in the IGP convergence times of the nodes thru which they
184 pass.
[minor]
This is a rather unformal writing style. what about the following:
TI-LFA paths are inherently loop-free and align with post-convergence routes.
Consequently, they are not susceptible to micro-loops that may arise due to
variations in the IGP convergence times across different nodes through
which these paths traverse. This ensures a stable and predictable routing
environment, minimizing disruptions typically associated with asynchronous
network behavior.
#Ahmed: thanks again for the text. I replaced existing text with the
suggested one
186 TI-LFA paths are applied from the moment the PLR detects failure of a
187 local link and until IGP convergence at the PLR is completed.
[minor]
readability rewrite:
TI-LFA paths are activated from the instant the PLR detects a failure in a
local link and remain in effect until the Interior Gateway Protocol (IGP)
convergence at the PLR is fully achieved.
#Ahmed: thanks again for the text. I replaced existing text with the
suggested one
190 micro-loops, especially if these paths have been computed using the
191 methods described in Section Section 6.2, Section 6.3, or Section 6.4
192 of the draft. One of the possible ways to prevent such micro-loops
[minor]
Instead of simply referencing the sections 6.2, 6.3 and 6.4, maybe line up the
conditions in which this occurs combined with the section references. This could
be something in the style 'if the FRR path is not using a direct neighbor
then... etc etc etc'
#Ahmed: IMO this will be redundant text. The reference to the relevant
sections avoids redundancy
206 For each destination in the network, TI-LFA pre-installs a backup
[minor]
what does destination exactly mean? is that a /32 or /128 node? or is it
router-ids? any other abstraction intended?
#Added the phrase "as specified by the IGP"
224 By using SR, TI-LFA does not require the establishment of TLDP
225 sessions (Targeted Label Distribution Protocol) with remote nodes in
226 order to take advantage of the applicability of remote LFAs (RLFA)
227 [RFC7490][RFC7916] or remote LFAs with directed forwarding
228 (DLFA)[RFC5714]. All the Segment Identifiers (SIDs) are available in
229 the link state database (LSDB) of the IGP. As a result, preferring
230 LFAs over RLFAs or DLFAs, as well as minimizing the number of RLFA or
231 DLFA repair nodes is not required anymore.
[minor]
possible rewrite for readability and simplicity:
"
By utilizing Segment Routing (SR), TI-LFA eliminates the need to establish
Targeted Label Distribution Protocol (TLDP) sessions with remote nodes for
leveraging the benefits of Remote Loop-Free Alternates (RLFA) [RFC7490][RFC7916]
or Directed Loop-Free Alternates (DLFA) [RFC5714]. All the Segment Identifiers
(SIDs) required are present within the Link State Database (LSDB) of the
Interior Gateway Protocol (IGP). Consequently, there is no longer a necessity
to prefer LFAs over RLFAs or DLFAs, nor is there a need to minimize the number
of RLFA or DLFA repair nodes.
#Ahmed: Thanks for the text suggestion. I replaced the original text
with the suggested one
"
233 By using SR, there is no need to create state in the network in order
234 to enforce an explicit FRR path. This relieves the nodes themselves
235 from having to maintain extra state, and it relieves the operator
236 from having to deploy an extra protocol or extra protocol sessions
237 just to enhance the protection coverage.
[minor]
what about this blob of text:
"
Utilizing SR makes the requirement unnecessary to establish additional
state within the network for enforcing explicit Fast Reroute (FRR) paths.
This alleviation spares the nodes from maintaining supplementary state and
frees the operator from the necessity to implement additional protocols or
protocol sessions solely to augment protection coverage.
#Ahmed: Thanks for the text suggestion. I replaced the original text
with the suggested one
"
239 Although not a Ti-LFA requirement or constraint, TI-LFA also brings
s/Ti-LFA/TI-LFA/
#Ahmed: Fixed
242 reduces the need of locally configured policies that drive the backup
[minor]
unsure what is meant with 'drive' means here. Would it be better to day that
'describe the backup...'
#Ahmed: I used the word "influence"
243 path selection ([RFC7916]). The easiest way to express the expected
244 post-convergence path in a loop-free manner is to encode it as a list
245 of adjacency segments. However, this may create a long SID list that
[major]
you write 'is to encode it'. What is the 'it'? I understand this is a
suggesting Adj SIDs. I also believe that simply having a list of Adj SIDs is
not sufficient, but that an "ordered" list of Adj SIDs is needed.
#Ahmed: A pronoun usually refers the nearest item in the sentence. The
nearest item in this sentence is "the expected post-convergence path".
245 of adjacency segments. However, this may create a long SID list that
246 some hardware may not be able to push. One of the challenges of TI-
[minor]
should we say push or program? push seems more sr-mpls dataplane specific, while
TI-LFA has applicability with SRv6 also
#Ahmed: Agreed. I changed "push" to "program".
248 adjacency segments and node segments. Each implementation will be
249 free to have its own SID list optimization algorithm. This document
250 details the basic concepts that could be used to build the SR backup
251 path as well as the associated dataplane procedures
possible rewrite:
"
Each implementation may independently develop its own algorithm for
optimizing the ordered SID list. This document provides an outline of the
fundamental concepts applicable to constructing the SR backup path, along
with the related dataplane procedures.
"
#Ahmed: Thanks. Replaced the original text with the suggested one
288 We define the main notations used in this document as the following.
290 We refer to "old" and "new" topologies as the LSDB state before and
291 after the considered failure.
[minor]
I would like to prefer not using the word 'we'. It is undefined who
that is. Is it the editor, authors, the WG the internet community, etc...
#Ahmed: I am open for suggestions for replacing "we".
286 3. Terminology
[minor]
Would section 3 be better located before section 2 for clarity?
#Ahmed: Almost all RFCs that have "terminology" section put after the
"Introduction". I would rather follow that convention to avoid push back
[major]
Later in the document there is usage of P(S,X) and Q(D,X) while
the terminology section only documents P(R,X). Maybe add some text
to clarify the intended use.
#Ahmed: the terminology section has "The Q-space Q(R,X) "
321 EP(P, Q) is an explicit SR-based path from a node P to a node Q.
[minor]
why not simply use 'SR path' instead of 'SR-based path'? does the
postfix '-based' add any representative value?
#Ahmed: Removed "-based"
335 An implementation is free to use any local optimization to provide
336 smaller SID lists by combining Node SIDs and Adjacency SIDs. In
[minor]
The intent seems to be to integrate adj SIDs and node SIDs into the SID lists.
Not sure that we are combining multiple SIDs into less SIDs:
"An implementation may employ any local optimization strategy to reduce
the size of SID lists by integrating Node SIDs and Adjacency SIDs into
the SID lists."
#Ahmed: The phrase "by integrating Node SIDs and Adjacency SIDs"
suggests an approach or paradigm for optimization algorithms. As
mentioned in the document, this is out of the scope of this document.
The current text is more general as it does not attempt to give hints
342 5. Intersecting P-Space and Q-Space with post-convergence paths
343
344 One of the challenges of defining an SR path following the expected
345 post-convergence path is to reduce the size of the segment list. In
[minor]
at the end of section 4 is written "These optimizations are out of scope of
this document," and then the first paragraph identifies that reducing the SID
lists is one of the challenges. For something that is out-of-scope of the
document it is perceived as rather important though problem to address. If
truly out of scope of this document, then maybe add explicit that the section 5
is all informational
#Ahmed: The end of section 4 explicitly mentions that it "provides some
guidance" that uses P-space and Q-space. So it clearly does not mandate
the use of this guidance.
[minor]
in some places the term 'segment lists' is used, in others 'SID lists'. Could a
single terminology be used throughout the document?
#Ahmed: replaced "SID list" with "segment list"
[major]
In the Terminology section the P-space, extended P-space and the Q-space is
explained. Not sure why all this is explained again in more explicit steps. It
make me wonder if section 5 can be reduced by reusing the Terminology in
section 3 and focus upon those?
#Ahmed: The terminology section defines the P-space and Q-space. Section
5 explains how to P-space and Q-space nodes that are also over the post
convergence path. IMO any reduction to the steps in this section will
make it quite obscure.
356 We want to determine which nodes on the post-convergence path from
[minor]
who is 'we'?
#Ahmed: Suggestions for replacing "we" are most welcomed.
358 regard to resource X (X can be a link or a set of links adjacent to
359 the PLR, or a neighbor node of the PLR).
[minor]
in section 3 Terminology section the document resource X was defined, but
using different definition: 'resource X (e.g. a link S-F, a node F, or a SRLG)'
Which one is correct? maybe reuse the Terminology definition for consistency
#Ahmed: I do not see any conflict between them. This section is just
providing an example of a resource X it does not define it
378 This can be found by intersecting the set of nodes belonging to the
379 post-convergence path from R to D, assuming the failure of X, with
380 Q(D, X).
[minor]
In terminology section 3 the Q(R, X) is described with 'R' used while
in this section5.2 the term Q(D, X) has 'D' used.
Is this intentional? why not add this in Terminology
section also? or make the Terminology section more opaque
to using any letter (e.g. 'R' or 'D') and describe the
intend of the Q(...) function?
#Ahmed: "X", "D", "R",..." are used the same way letters "x", "y" and
"z" are used in Algebra. I do not understand what is needed here?
397 protected resource X and, at the same time, is guaranteed to be loop-
398 free irrespective of the state of FIBs along the nodes belonging to
399 the explicit path. Thus, there is no need for any co-ordination or
[minor]
There is assumption here that only SR programs the FIB. There may be out
of Band FIB programming that does cause loops. Maybe frame the
claim better by expressing the assumption made to warrant loop-free paths.
#Ahmed: The beginning of the document explicitly mentioned IGP. So it is
clear that other forwarding states are outside the scope of this document.
460 6.2. FRR path using a PQ node
[minor]
Is there a reason that there are no considerations for an implementer
to select the PQ node closest to the S or closest to the D?
#Ahmed: The document clearly says that it is just "suggesting" methods.
You suggestion is another implementation details, which are out of scope
of the document.
499 interface for the packet, S-F. The failure of the primary outgoing
[minor]
what is the 'F' in the S-F?
#Ahmed: The text says "link S-F". Isn't it obvious that "F" is the far
end of that link?
512 We define hereafter the FRR behavior applied by S for any packet
513 received with an active adjacency segment S-F for which protection
514 was enabled. As protection has been enabled for the segment S-F and
515 signaled in the IGP (for instance using protocol extensions from
516 [RFC8667] and [RFC8665]), any SR policy using this segment knows that
517 it may be transiently rerouted out of S-F in case of S-F failure.
[minor]
A policy is a configuration. A policy does not 'know' anything. Can the
statement be made without anthropomorphism?
#Ahmed: I changed it to "a calculator of any policy that uses"
637 and Q-Space as well as the post-convergence path. An implementation
638 MUST only use Node-SIDs bound to the FlexAlgo and/or Adj-SIDs that
639 are unprotected to build the repair list.
[major]
This is written from an sr-mpls perspective. For SRv6 the Adj is bound to an
algorithm and this condition does not apply
#Ahmed: Modified to mention that for SRv6, adj-sids that are bound to
the flexalgo
647 S --- R2 --- R3 --- R4 --- R5 --- D
648 \ | \ /
649 R7 -- R8
650 | |
651 R9 -- R10
653 Figure 2
655 In Figure 2, all the metrics are equal to 1 except
656 R2-R7,R7-R8,R8-R4,R7-R9 which have a metric of 1000. Considering R2
[minor]
The drawing here is in different style as figure 1 where - and * is used to
visualize the different link metrics. Maybe consistent drawing style should be
used in the document?
#Ahmed: I modified R2-R7,R7-R8,R8-R4,R7-R9 to become "*"
665 To avoid the possibility of this double FRR activation, an
666 implementation of TI-LFA MAY pick only non protected adjacency
667 segments when building the repair list. However, this is important
[minor]
While double failures may initially sound as an exotic event, it may be
more frequent as initially assumed when SRLGs are considered. In some operators
multiple 'link' use the same optical cables and if one fiber gets cut, then
many links may be impacted, causing double failures. Maybe worth to mention
that double failures is not as rare as one may believe.
#Ahmed: IMO opinion trying to make claims about the frequency of
failures will result in too many objections and comments and is not
relevant to the scope of the document
676 11. Advantages of using the expected post-convergence path during FRR
[minor]
This section is complex detailed read and seems surface level over detailed.
Can the advantage description not be simplified. Is this detail necessary for
this place for the document? Alternatively, consider moving this section into
an appendix Consider removing anthropomorphism in this section. TI-LFA has no
awareness, it may however be opaque to constraints (i.e. 'TI-LFA cannot be
aware of such path constraints and' )
#Ahmed: I moved this section to Appendix
783 12. Analysis based on real network topologies
[major]
consider placing this section into an appendix. The shared information
does not add additional considerations to the TI-LFA procedure description
#Ahmed: I moved this section to Appendix
_______________________________________________
rtgwg mailing list -- rtgwg@ietf.org
To unsubscribe send an email to rtgwg-le...@ietf.org