Thanks Ahmed, Author team,

Thanks for the considerations and addressing the DISCUSS and COMMENT items.
I reviewed the diff between v13 1nd v14 of the draft and correspond with the 
feedback and considerations provided.

I will clear my blocking DISCUSS on the document.

Be well,
G/

From: Ahmed Bashandy <abashandy.i...@gmail.com>
Sent: Wednesday, May 8, 2024 5:48 PM
To: Gunter van de Velde (Nokia) <gunter.van_de_ve...@nokia.com>; The IESG 
<i...@ietf.org>
Cc: draft-ietf-rtgwg-segment-routing-ti-...@ietf.org; rtgwg-cha...@ietf.org; 
rtgwg@ietf.org; stewart.bry...@gmail.com
Subject: Re: Gunter Van de Velde's Discuss on 
draft-ietf-rtgwg-segment-routing-ti-lfa-13: (with DISCUSS and COMMENT)


CAUTION: This is an external email. Please be very careful when clicking links 
or opening attachments. See the URL nok.it/ext for additional information.



Thank you for the detailed review

I uploaded version 14 of the draft.

See #Ahmed for response to the comments



Ahmed


On 4/17/24 5:04 AM, Gunter Van de Velde via Datatracker wrote:

Gunter Van de Velde has entered the following ballot position for

draft-ietf-rtgwg-segment-routing-ti-lfa-13: Discuss



When responding, please keep the subject line intact and reply to all

email addresses included in the To and CC lines. (Feel free to cut this

introductory paragraph, however.)





Please refer to 
https://www.ietf.org/about/groups/iesg/statements/handling-ballot-positions/

for more information about how to handle DISCUSS and COMMENT positions.





The document, along with other ballot positions, can be found here:

https://datatracker.ietf.org/doc/draft-ietf-rtgwg-segment-routing-ti-lfa/







----------------------------------------------------------------------

DISCUSS:

----------------------------------------------------------------------



# Gunter Van de Velde, RTG AD, comments for 
draft-ietf-rtgwg-segment-routing-ti-lfa-13



Please find below two blocking DISCUSS points (easy to address), and a series of

non-blocking COMMENTs and some nits.



Many thanks for the RTGDIR reviews from Stewart Bryant,

Andy Smith and Ben Niven-Jenkins during the 7 years development

period of the TI-LFA specification. Also many thanks for the shepherd

write-up by Steward Bryant to provide a brief overview of the

progress of the draft through the WG and the current state of art.



Thank you to the authors of this document. I really appreciate the

effort and believe it captures the TI-LFA normative procedures well.

Reviewing it with fresh eyes, I've made several comments that could

help further improve the quality. I hope these insights will be

valuable for the authors and the Working Group as you continue

to refine the document.



DISCUSS:

========



DISCUSS#1

In section '9. TI-LFA and SR algorithms' i found the text written from sr-mpls

perspective. SRv6 has different considerations.



637         and Q-Space as well as the post-convergence path.  An implementation

638         MUST only use Node-SIDs bound to the FlexAlgo and/or Adj-SIDs that

639         are unprotected to build the repair list.



The above seems written from an sr-mpls perspective. For SRv6 the Adj-SID is 
bound

to a Locator and consequently bound to an algorithm. As result, the observed 
limitation

of sr-mpls does not really apply for SRv6. For SRv6 an implementation can

use protected Adj-SID in the repair path without breaking algorithm aware

topology requirements. Consider allowing protected SRv6 Adj-SIDs for TI-LFA.
#Ahmed: version 14 modified the last sentence to indicate that SRv6 adj-SIDs 
can be used


In addition consider some blob of text about Adj-SIDs and locators in

"section 8.2.  SRv6 dataplane considerations" could be beneficial.

With sr-mpls there is no correlation to the segment routing algorithm, however

when using SRv6 dataplane Adj-SID Locator is correlated to an algorithm.
#Ahmed: Section 8.2 refers to [RFC8754] and [RFC8986] that detail SRV6. IMO any 
additional text explaining SRv6 dataplane will be redundant and may cause more 
confusion. At the same time the reader is referred to documents that provide 
all details about SRv6


DISCUSS#2

Sections 11 and 12 do not introduce any supplementary artifacts to the normative

procedures outlined for TI-LFA. The information within section11 and 12 is 
provided

in extensive detail. Should the Working Group (WG) prefer to maintain this

level of specificity, it is advisable to consider relocating the detailed

content to an appendix unless there is a strong reason to keep it in the main

body of the document.
#Ahmed: moved to Appendix A and B




----------------------------------------------------------------------

COMMENT:

----------------------------------------------------------------------



High level comments:

====================



* TI-LFA is based upon Segment Routing, however the document seems to have

mostly sr-mpls datapane type language. The SRv6 dataplane is only mentioned

first time on line 493, almost half way through the document. Maybe consider

mentioning support for SRv6 dataplane earlier onwards.
#Ahmed: From the point of view of the scope of this document, there is a small 
difference between SR-MPLS and SRv6 (some of which you pointed out (thanks a 
lot)). That is why none of them was explicitly mentioned early on. At the same 
time, they were both mentioned in the same sentence. If I were to explicitly 
mention SRv6 early on, then I have to do the same for SR-MPLS


 * 6 people on front

page. Did all authors edit text in the draft?

#Ahmed: All authors had significant contribution to this draft. It will not be 
doing justice to drop any of them



* Operational impact may want to

explicit mention that there is no interop complexity because TI-LFA is a node

local operation * the document makes use of the term 'we' and other

anthropomorphism. Maybe not the best approach in a formal document. Who is

'we'? editor, authors, WG, IETF community, operators, etc? policies have no

awareness or emotions



Detailed review COMMENTS ([minor] and [major])

==============================================

(Line numbers are rendered using idnits rendering)



19         This document presents Topology Independent Loop-free Alternate Fast

20         Re-route (TI-LFA), aimed at providing protection of node and

21         adjacency segments within the Segment Routing (SR) framework.  This



[minor]

s/Re-route/Reroute/
#Fixed


[major]

The description provide insight that TI-LFA provide protection of node and adj

segments. It does not specify what 'protection' is all about or that

'protection' is constrained to single link|node failures. i.e. rfc5286 has

explicit text in the abstract about single failure applicability.



24         (DLFA).  It extends these concepts to provide guaranteed coverage in

25         any two connected networks using a link-state IGP.  A key aspect of
#Ahmed: The abstract is too short to provide more details. The specific 
protection description is provided in the paragraph starting with "For each 
Destination" in Page 5


[major]

in this sentence 'two connected networks' is referenced, while earlier in the

paragraph there is indication of 'protection of node and adjacency segments'.

How doe two connected networks correlate with the segments?
#Ahmed: A 2-connected network is a network that does not become partitioned as 
a result of a single failure. The concepts of segment is detailed in the 
references. I am not really sure if I understand your concern

25         any two connected networks using a link-state IGP.  A key aspect of

26         TI-LFA is the FRR path selection approach establishing protection

27         over the expected post-convergence paths from the point of local

28         repair, reducing the operational need to control the tie-breaks among

29         various FRR options.



[minor]

suggested rewrite to make the text better readable:

A principal attribute of TI-LFA is the FRR path selection methodology, which

establishes protection over the anticipated post-convergence paths from the

point of local repair. This approach diminishes the operational necessity

to manage the tie-breaks among various FRR alternatives.
#Ahmed: IMO the text is clear.


[minor]

why is the path selection better? can a hint be given why it is better

beyond a statement proclaiming it is better?
#Ahmed: Second paragraph in Appendix A (which used to be section 10 in version 
13 and moved to Appendix A  based on your advice) in version 14 of the draft 
explains why it is better


138        *  TI-LFA: Topology Independant LFA.



[minor]

s/Independant/Independent/
#Ahmed: Fixed


144        Segment Routing aims at supporting services with tight SLA guarantees

145        [RFC8402].  By relying on SR this document provides a local repair



[major]

The term SLA does not appear even once in RFC8402. How can the claim of

tight SLA be justified with RFC8402? can an better pointer to the claim be

inserted?
#Ahmed: I removed the sentence


[minor]

s/Segment Routing/Segment Routing (SR)/



145        [RFC8402].  By relying on SR this document provides a local repair

146        mechanism for standard link-state IGP shortest path capable of

147        restoring end-to-end connectivity in the case of a sudden directly

148        connected failure of a network component.  Non-SR mechanisms for



[minor]

readability rewrite:

This document outlines a local repair mechanism that leverages Segment

Routing (SR) to restore end-to-end connectivity in the event of an

abrupt failure involving a directly connected network component.

This mechanism is designed for standard link-state Interior Gateway

Protocol (IGP) shortest path scenarios.
#Ahmed: thanks for the text suggestion. I replaced the original text with that 
suggestion


153        The term topology independent (TI) refers to the ability to provide a

154        loop free backup path irrespective of the topologies used in the

155        network.  This provides a major improvement compared to LFA [RFC5286]

156        and remote LFA [RFC7490] which cannot provide a complete protection

157        coverage in some topologies as described in [RFC6571].



[minor]

I think what is been trying to say is:

The term topology independent (TI) describes the capability of

providing a loop-free backup path that is effective across all network

topologies. This represents a significant enhancement over Loop-Free

Alternate (LFA) [RFC5286] and Remote LFA as outlined in

[RFC7490], both of which do not offer comprehensive protection coverage

in certain topological configurations as detailed in [RFC6571]. TI-LFA

ensures the availability of a backup path if a post-convergence path

exists, regardless of the network topology.
#Ahmed: Thanks again for the text suggestion.  I replaced the original text 
with that suggestion


167        TI-LFA is a local operation applied by the PLR when it detects

168        failure of one of its local links.  As such, it does not affect:



[minor]

It would be welcome to explicit spell that TI-LFA is protection against

a single local link failure
#Ahmed: The paragraph starting with "For each destination" in Page 5 mentions 
that


[minor]

It was mentioned that TI-LFA provide protection against link and node failure.

In this section the abrupt fail of a link is mentioned to trigger FRR. How is

node-protection with TI-LFA achieved and the PLR triggered that neighboring

node is no more operational? It is elaborated upon later in this

section, but maybe a brief hint could be provided here too?
#Ahmed: As you mentioned, it is already provided. IMO (and probably the opinion 
of others) it will be redundant to re-provide description here.


167        TI-LFA is a local operation applied by the PLR when it detects

168        failure of one of its local links.  As such, it does not affect:



170        *  Micro-loops that appear - or do not appear – as part of the

171           distributed IGP convergence [RFC5715] on the paths to the

172           destination that do not pass thru TI-LFA paths:



174           -  As explained in [RFC5714], such micro-loops may result in the

175              traffic not reaching the PLR and therefore not following TI-LFA

176              paths.



178        *  Micro-loops that appear – or do not appear - when the failed link

179           is repaired.



[minor]

This does not process very well. I tried reading a few times this paragraph

and believe what is mentioned could be rewritten as follows:



"TI-LFA operates locally at the Point of Local Repair (PLR) upon detecting

a failure in one of its direct links. Consequently, this local operation

does not influence:



* Micro-loops that may or may not form during the distributed Interior

Gateway Protocol (IGP) convergence as delineated in RFC 5715.



- These micro-loops occur on routes directed towards the destination that

do not traverse TI-LFA-configured paths. According to [RFC5714], the formation

of such micro-loops can prevent traffic from reaching the PLR, thereby

bypassing the TI-LFA paths established for rerouting.



* Micro-loops that may or may not develop when the previously failed link

is restored to functionality.
#Ahmed: thanks again for the text. I replaced existing text with the suggested 
one


This specification highlights that while TI-LFA effectively addresses specific

link failures, it does not extend its impact to managing micro-loops

associated with broader IGP convergence issues or subsequent link repairs."



181        TI-LFA paths are loop-free.  What’s more, they follow the post-

182        convergence paths, and, therefore, not subject to micro-loops due to

183        difference in the IGP convergence times of the nodes thru which they

184        pass.



[minor]

This is a rather unformal writing style. what about the following:



TI-LFA paths are inherently loop-free and align with post-convergence routes.

Consequently, they are not susceptible to micro-loops that may arise due to

variations in the IGP convergence times across different nodes through

which these paths traverse. This ensures a stable and predictable routing

environment, minimizing disruptions typically associated with asynchronous

network behavior.
#Ahmed: thanks again for the text. I replaced existing text with the suggested 
one


186        TI-LFA paths are applied from the moment the PLR detects failure of a

187        local link and until IGP convergence at the PLR is completed.



[minor]

readability rewrite:

TI-LFA paths are activated from the instant the PLR detects a failure in a

local link and remain in effect until the Interior Gateway Protocol (IGP)

convergence at the PLR is fully achieved.
#Ahmed: thanks again for the text. I replaced existing text with the suggested 
one


190        micro-loops, especially if these paths have been computed using the

191        methods described in Section Section 6.2, Section 6.3, or Section 6.4

192        of the draft.  One of the possible ways to prevent such micro-loops



[minor]

Instead of simply referencing the sections 6.2, 6.3 and 6.4, maybe line up the

conditions in which this occurs combined with the section references. This could

be something in the style 'if the FRR path is not using a direct neighbor

then... etc etc etc'
#Ahmed: IMO this will be redundant text. The reference to the relevant sections 
avoids redundancy


206        For each destination in the network, TI-LFA pre-installs a backup



[minor]

what does destination exactly mean? is that a /32 or /128 node? or is it

router-ids? any other abstraction intended?
#Added the phrase "as specified by the IGP"


224        By using SR, TI-LFA does not require the establishment of TLDP

225        sessions (Targeted Label Distribution Protocol) with remote nodes in

226        order to take advantage of the applicability of remote LFAs (RLFA)

227        [RFC7490][RFC7916] or remote LFAs with directed forwarding

228        (DLFA)[RFC5714].  All the Segment Identifiers (SIDs) are available in

229        the link state database (LSDB) of the IGP.  As a result, preferring

230        LFAs over RLFAs or DLFAs, as well as minimizing the number of RLFA or

231        DLFA repair nodes is not required anymore.



[minor]

possible rewrite for readability and simplicity:



"

By utilizing Segment Routing (SR), TI-LFA eliminates the need to establish

Targeted Label Distribution Protocol (TLDP) sessions with remote nodes for

leveraging the benefits of Remote Loop-Free Alternates (RLFA) [RFC7490][RFC7916]

or Directed Loop-Free Alternates (DLFA) [RFC5714]. All the Segment Identifiers

(SIDs) required are present within the Link State Database (LSDB) of the

Interior Gateway Protocol (IGP). Consequently, there is no longer a necessity

to prefer LFAs over RLFAs or DLFAs, nor is there a need to minimize the number

of RLFA or DLFA repair nodes.
#Ahmed: Thanks for the text suggestion. I replaced the original text with the 
suggested one


"



233        By using SR, there is no need to create state in the network in order

234        to enforce an explicit FRR path.  This relieves the nodes themselves

235        from having to maintain extra state, and it relieves the operator

236        from having to deploy an extra protocol or extra protocol sessions

237        just to enhance the protection coverage.



[minor]

what about this blob of text:

"

Utilizing SR makes the requirement unnecessary to establish additional

state within the network for enforcing explicit Fast Reroute (FRR) paths.

This alleviation spares the nodes from maintaining supplementary state and

frees the operator from the necessity to implement additional protocols or

protocol sessions solely to augment protection coverage.
#Ahmed: Thanks for the text suggestion. I replaced the original text with the 
suggested one

"



239        Although not a Ti-LFA requirement or constraint, TI-LFA also brings



s/Ti-LFA/TI-LFA/
#Ahmed: Fixed


242        reduces the need of locally configured policies that drive the backup



[minor]

unsure what is meant with 'drive' means here. Would it be better to day that

'describe the backup...'
#Ahmed: I used the word "influence"


243        path selection ([RFC7916]).  The easiest way to express the expected

244        post-convergence path in a loop-free manner is to encode it as a list

245        of adjacency segments.  However, this may create a long SID list that



[major]

you write 'is to encode it'. What is the 'it'? I understand this is a

suggesting Adj SIDs. I also believe that simply having a list of Adj SIDs is

not sufficient, but that an "ordered" list of Adj SIDs is needed.
#Ahmed: A pronoun usually refers the nearest item in the sentence. The nearest 
item in this sentence is "the expected post-convergence path".


245        of adjacency segments.  However, this may create a long SID list that

246        some hardware may not be able to push.  One of the challenges of TI-



[minor]

should we say push or program? push seems more sr-mpls dataplane specific, while

TI-LFA has applicability with SRv6 also
#Ahmed: Agreed. I changed "push" to "program".


248        adjacency segments and node segments.  Each implementation will be

249        free to have its own SID list optimization algorithm.  This document

250        details the basic concepts that could be used to build the SR backup

251        path as well as the associated dataplane procedures



possible rewrite:

"

Each implementation may independently develop its own algorithm for

optimizing the ordered SID list. This document provides an outline of the

fundamental concepts applicable to constructing the SR backup path, along

with the related dataplane procedures.

"
#Ahmed: Thanks. Replaced the original text with the suggested one


288        We define the main notations used in this document as the following.



290        We refer to "old" and "new" topologies as the LSDB state before and

291        after the considered failure.



[minor]

I would like to prefer not using the word 'we'. It is undefined who

that is. Is it the editor, authors, the WG the internet community, etc...
#Ahmed: I am open for suggestions for replacing "we".


286     3.  Terminology



[minor]

Would section 3 be better located before section 2 for clarity?
#Ahmed: Almost all RFCs that have "terminology" section put after the 
"Introduction". I would rather follow that convention to avoid push back


[major]

Later in the document there is usage of P(S,X) and Q(D,X) while

the terminology section only documents P(R,X). Maybe add some text

to clarify the intended use.
#Ahmed: the terminology section has "The Q-space Q(R,X) "


321        EP(P, Q) is an explicit SR-based path from a node P to a node Q.



[minor]

why not simply use 'SR path' instead of 'SR-based path'? does the

postfix '-based' add any representative value?
#Ahmed: Removed "-based"


335        An implementation is free to use any local optimization to provide

336        smaller SID lists by combining Node SIDs and Adjacency SIDs.  In



[minor]

The intent seems to be to integrate adj SIDs and node SIDs into the SID lists.

Not sure that we are combining multiple SIDs into less SIDs:

"An implementation may employ any local optimization strategy to reduce

the size of SID lists by integrating Node SIDs and Adjacency SIDs into

the SID lists."
#Ahmed: The phrase "by integrating Node SIDs and Adjacency SIDs"  suggests an 
approach or paradigm for optimization algorithms. As mentioned in the document, 
this is out of the scope of this document. The current text is more general as 
it does not attempt to give hints


342     5.  Intersecting P-Space and Q-Space with post-convergence paths

343

344        One of the challenges of defining an SR path following the expected

345        post-convergence path is to reduce the size of the segment list.  In



[minor]

at the end of section 4 is written "These optimizations are out of scope of

this document," and then the first paragraph identifies that reducing the SID

lists is one of the challenges. For something that is out-of-scope of the

document it is perceived as rather important though problem to address. If

truly out of scope of this document, then maybe add explicit that the section 5

is all informational
#Ahmed: The end of section 4 explicitly mentions that it "provides some 
guidance" that uses P-space and Q-space. So it clearly does not mandate the use 
of this guidance.


[minor]

in some places the term 'segment lists' is used, in others 'SID lists'. Could a

single terminology be used throughout the document?
#Ahmed: replaced "SID list" with "segment list"


[major]

In the Terminology section the P-space, extended P-space and the Q-space is

explained. Not sure why all this is explained again in more explicit steps. It

make me wonder if section 5 can be reduced by reusing the Terminology in

section 3 and focus upon those?
#Ahmed: The terminology section defines the P-space and Q-space. Section 5 
explains how to P-space and Q-space nodes that are also over the post 
convergence path. IMO any reduction to the steps in this section will make it 
quite obscure.


356        We want to determine which nodes on the post-convergence path from



[minor]

who is 'we'?
#Ahmed: Suggestions for replacing "we" are most welcomed.


358        regard to resource X (X can be a link or a set of links adjacent to

359        the PLR, or a neighbor node of the PLR).



[minor]

in section 3 Terminology section the document resource X was defined, but

using different definition: 'resource X (e.g. a link S-F, a node F, or a SRLG)'

Which one is correct? maybe reuse the Terminology definition for consistency
#Ahmed: I do not see any conflict between them. This section is just providing 
an example of a resource X it does not define it


378        This can be found by intersecting the set of nodes belonging to the

379        post-convergence path from R to D, assuming the failure of X, with

380        Q(D, X).



[minor]

In terminology section 3 the Q(R, X) is described with 'R' used while

in this section5.2 the term Q(D, X) has 'D' used.

Is this intentional? why not add this in Terminology

section also? or make the Terminology section more opaque

to using any letter (e.g. 'R' or 'D') and describe the

intend of the Q(...) function?
#Ahmed: "X", "D", "R",..." are used the same way letters "x", "y" and "z" are 
used in Algebra. I do not understand what is needed here?


397        protected resource X and, at the same time, is guaranteed to be loop-

398        free irrespective of the state of FIBs along the nodes belonging to

399        the explicit path.  Thus, there is no need for any co-ordination or



[minor]

There is assumption here that only SR programs the FIB. There may be out

of Band FIB programming that does cause loops. Maybe frame the

claim better by expressing the assumption made to warrant loop-free paths.
#Ahmed: The beginning of the document explicitly mentioned IGP. So it is clear 
that other forwarding states are outside the scope of this document.

460     6.2.  FRR path using a PQ node



[minor]

Is there a reason that there are no considerations for an implementer

to select the PQ node closest to the S or closest to the D?
#Ahmed: The document clearly says that it is just "suggesting" methods. You 
suggestion is another implementation details, which are out of scope of the 
document.






499        interface for the packet, S-F.  The failure of the primary outgoing



[minor]

what is the 'F' in the S-F?
#Ahmed: The text says "link S-F". Isn't it obvious that "F" is the far end of 
that link?


512        We define hereafter the FRR behavior applied by S for any packet

513        received with an active adjacency segment S-F for which protection

514        was enabled.  As protection has been enabled for the segment S-F and

515        signaled in the IGP (for instance using protocol extensions from

516        [RFC8667] and [RFC8665]), any SR policy using this segment knows that

517        it may be transiently rerouted out of S-F in case of S-F failure.



[minor]

A policy is a configuration. A policy does not 'know' anything. Can the

statement be made without anthropomorphism?
#Ahmed: I changed it to "a calculator of any policy that uses"


637        and Q-Space as well as the post-convergence path.  An implementation

638        MUST only use Node-SIDs bound to the FlexAlgo and/or Adj-SIDs that

639        are unprotected to build the repair list.



[major]

This is written from an sr-mpls perspective. For SRv6 the Adj is bound to an

algorithm and this condition does not apply
#Ahmed: Modified to mention that for SRv6, adj-sids that are bound to the 
flexalgo


647                S --- R2 --- R3 --- R4 --- R5 --- D

648                         \    |  \  /

649                            R7 -- R8

650                             |    |

651                            R9 -- R10



653                                       Figure 2



655        In Figure 2, all the metrics are equal to 1 except

656        R2-R7,R7-R8,R8-R4,R7-R9 which have a metric of 1000.  Considering R2



[minor]

The drawing here is in different style as figure 1 where - and * is used to

visualize the different link metrics. Maybe consistent drawing style should be

used in the document?
#Ahmed: I modified R2-R7,R7-R8,R8-R4,R7-R9 to become "*"


665        To avoid the possibility of this double FRR activation, an

666        implementation of TI-LFA MAY pick only non protected adjacency

667        segments when building the repair list.  However, this is important



[minor]

While double failures may initially sound as an exotic event, it may be

more frequent as initially assumed when SRLGs are considered. In some operators

multiple 'link' use the same optical cables and if one fiber gets cut, then

many links may be impacted, causing double failures. Maybe worth to mention

that double failures is not as rare as one may believe.
#Ahmed: IMO opinion trying to make claims about the frequency of failures will 
result in too many objections and comments and is not relevant to the scope of 
the document


676     11.  Advantages of using the expected post-convergence path during FRR



[minor]

This section is complex detailed read and seems surface level over detailed.

Can the advantage description not be simplified. Is this detail necessary for

this place for the document? Alternatively, consider moving this section into

an appendix Consider removing anthropomorphism in this section. TI-LFA has no

awareness, it may however be opaque to constraints (i.e. 'TI-LFA cannot be

aware of such path constraints and' )
#Ahmed: I moved this section to Appendix


783     12.  Analysis based on real network topologies



[major]

consider placing this section into an appendix. The shared information

does not add additional considerations to the TI-LFA procedure description
#Ahmed: I moved this section to Appendix




_______________________________________________
rtgwg mailing list -- rtgwg@ietf.org
To unsubscribe send an email to rtgwg-le...@ietf.org

Reply via email to