Thanks Ahmed, Author team, Thanks for the considerations and addressing the DISCUSS and COMMENT items. I reviewed the diff between v13 1nd v14 of the draft and correspond with the feedback and considerations provided.
I will clear my blocking DISCUSS on the document. Be well, G/ From: Ahmed Bashandy <abashandy.i...@gmail.com> Sent: Wednesday, May 8, 2024 5:48 PM To: Gunter van de Velde (Nokia) <gunter.van_de_ve...@nokia.com>; The IESG <i...@ietf.org> Cc: draft-ietf-rtgwg-segment-routing-ti-...@ietf.org; rtgwg-cha...@ietf.org; rtgwg@ietf.org; stewart.bry...@gmail.com Subject: Re: Gunter Van de Velde's Discuss on draft-ietf-rtgwg-segment-routing-ti-lfa-13: (with DISCUSS and COMMENT) CAUTION: This is an external email. Please be very careful when clicking links or opening attachments. See the URL nok.it/ext for additional information. Thank you for the detailed review I uploaded version 14 of the draft. See #Ahmed for response to the comments Ahmed On 4/17/24 5:04 AM, Gunter Van de Velde via Datatracker wrote: Gunter Van de Velde has entered the following ballot position for draft-ietf-rtgwg-segment-routing-ti-lfa-13: Discuss When responding, please keep the subject line intact and reply to all email addresses included in the To and CC lines. (Feel free to cut this introductory paragraph, however.) Please refer to https://www.ietf.org/about/groups/iesg/statements/handling-ballot-positions/ for more information about how to handle DISCUSS and COMMENT positions. The document, along with other ballot positions, can be found here: https://datatracker.ietf.org/doc/draft-ietf-rtgwg-segment-routing-ti-lfa/ ---------------------------------------------------------------------- DISCUSS: ---------------------------------------------------------------------- # Gunter Van de Velde, RTG AD, comments for draft-ietf-rtgwg-segment-routing-ti-lfa-13 Please find below two blocking DISCUSS points (easy to address), and a series of non-blocking COMMENTs and some nits. Many thanks for the RTGDIR reviews from Stewart Bryant, Andy Smith and Ben Niven-Jenkins during the 7 years development period of the TI-LFA specification. Also many thanks for the shepherd write-up by Steward Bryant to provide a brief overview of the progress of the draft through the WG and the current state of art. Thank you to the authors of this document. I really appreciate the effort and believe it captures the TI-LFA normative procedures well. Reviewing it with fresh eyes, I've made several comments that could help further improve the quality. I hope these insights will be valuable for the authors and the Working Group as you continue to refine the document. DISCUSS: ======== DISCUSS#1 In section '9. TI-LFA and SR algorithms' i found the text written from sr-mpls perspective. SRv6 has different considerations. 637 and Q-Space as well as the post-convergence path. An implementation 638 MUST only use Node-SIDs bound to the FlexAlgo and/or Adj-SIDs that 639 are unprotected to build the repair list. The above seems written from an sr-mpls perspective. For SRv6 the Adj-SID is bound to a Locator and consequently bound to an algorithm. As result, the observed limitation of sr-mpls does not really apply for SRv6. For SRv6 an implementation can use protected Adj-SID in the repair path without breaking algorithm aware topology requirements. Consider allowing protected SRv6 Adj-SIDs for TI-LFA. #Ahmed: version 14 modified the last sentence to indicate that SRv6 adj-SIDs can be used In addition consider some blob of text about Adj-SIDs and locators in "section 8.2. SRv6 dataplane considerations" could be beneficial. With sr-mpls there is no correlation to the segment routing algorithm, however when using SRv6 dataplane Adj-SID Locator is correlated to an algorithm. #Ahmed: Section 8.2 refers to [RFC8754] and [RFC8986] that detail SRV6. IMO any additional text explaining SRv6 dataplane will be redundant and may cause more confusion. At the same time the reader is referred to documents that provide all details about SRv6 DISCUSS#2 Sections 11 and 12 do not introduce any supplementary artifacts to the normative procedures outlined for TI-LFA. The information within section11 and 12 is provided in extensive detail. Should the Working Group (WG) prefer to maintain this level of specificity, it is advisable to consider relocating the detailed content to an appendix unless there is a strong reason to keep it in the main body of the document. #Ahmed: moved to Appendix A and B ---------------------------------------------------------------------- COMMENT: ---------------------------------------------------------------------- High level comments: ==================== * TI-LFA is based upon Segment Routing, however the document seems to have mostly sr-mpls datapane type language. The SRv6 dataplane is only mentioned first time on line 493, almost half way through the document. Maybe consider mentioning support for SRv6 dataplane earlier onwards. #Ahmed: From the point of view of the scope of this document, there is a small difference between SR-MPLS and SRv6 (some of which you pointed out (thanks a lot)). That is why none of them was explicitly mentioned early on. At the same time, they were both mentioned in the same sentence. If I were to explicitly mention SRv6 early on, then I have to do the same for SR-MPLS * 6 people on front page. Did all authors edit text in the draft? #Ahmed: All authors had significant contribution to this draft. It will not be doing justice to drop any of them * Operational impact may want to explicit mention that there is no interop complexity because TI-LFA is a node local operation * the document makes use of the term 'we' and other anthropomorphism. Maybe not the best approach in a formal document. Who is 'we'? editor, authors, WG, IETF community, operators, etc? policies have no awareness or emotions Detailed review COMMENTS ([minor] and [major]) ============================================== (Line numbers are rendered using idnits rendering) 19 This document presents Topology Independent Loop-free Alternate Fast 20 Re-route (TI-LFA), aimed at providing protection of node and 21 adjacency segments within the Segment Routing (SR) framework. This [minor] s/Re-route/Reroute/ #Fixed [major] The description provide insight that TI-LFA provide protection of node and adj segments. It does not specify what 'protection' is all about or that 'protection' is constrained to single link|node failures. i.e. rfc5286 has explicit text in the abstract about single failure applicability. 24 (DLFA). It extends these concepts to provide guaranteed coverage in 25 any two connected networks using a link-state IGP. A key aspect of #Ahmed: The abstract is too short to provide more details. The specific protection description is provided in the paragraph starting with "For each Destination" in Page 5 [major] in this sentence 'two connected networks' is referenced, while earlier in the paragraph there is indication of 'protection of node and adjacency segments'. How doe two connected networks correlate with the segments? #Ahmed: A 2-connected network is a network that does not become partitioned as a result of a single failure. The concepts of segment is detailed in the references. I am not really sure if I understand your concern 25 any two connected networks using a link-state IGP. A key aspect of 26 TI-LFA is the FRR path selection approach establishing protection 27 over the expected post-convergence paths from the point of local 28 repair, reducing the operational need to control the tie-breaks among 29 various FRR options. [minor] suggested rewrite to make the text better readable: A principal attribute of TI-LFA is the FRR path selection methodology, which establishes protection over the anticipated post-convergence paths from the point of local repair. This approach diminishes the operational necessity to manage the tie-breaks among various FRR alternatives. #Ahmed: IMO the text is clear. [minor] why is the path selection better? can a hint be given why it is better beyond a statement proclaiming it is better? #Ahmed: Second paragraph in Appendix A (which used to be section 10 in version 13 and moved to Appendix A based on your advice) in version 14 of the draft explains why it is better 138 * TI-LFA: Topology Independant LFA. [minor] s/Independant/Independent/ #Ahmed: Fixed 144 Segment Routing aims at supporting services with tight SLA guarantees 145 [RFC8402]. By relying on SR this document provides a local repair [major] The term SLA does not appear even once in RFC8402. How can the claim of tight SLA be justified with RFC8402? can an better pointer to the claim be inserted? #Ahmed: I removed the sentence [minor] s/Segment Routing/Segment Routing (SR)/ 145 [RFC8402]. By relying on SR this document provides a local repair 146 mechanism for standard link-state IGP shortest path capable of 147 restoring end-to-end connectivity in the case of a sudden directly 148 connected failure of a network component. Non-SR mechanisms for [minor] readability rewrite: This document outlines a local repair mechanism that leverages Segment Routing (SR) to restore end-to-end connectivity in the event of an abrupt failure involving a directly connected network component. This mechanism is designed for standard link-state Interior Gateway Protocol (IGP) shortest path scenarios. #Ahmed: thanks for the text suggestion. I replaced the original text with that suggestion 153 The term topology independent (TI) refers to the ability to provide a 154 loop free backup path irrespective of the topologies used in the 155 network. This provides a major improvement compared to LFA [RFC5286] 156 and remote LFA [RFC7490] which cannot provide a complete protection 157 coverage in some topologies as described in [RFC6571]. [minor] I think what is been trying to say is: The term topology independent (TI) describes the capability of providing a loop-free backup path that is effective across all network topologies. This represents a significant enhancement over Loop-Free Alternate (LFA) [RFC5286] and Remote LFA as outlined in [RFC7490], both of which do not offer comprehensive protection coverage in certain topological configurations as detailed in [RFC6571]. TI-LFA ensures the availability of a backup path if a post-convergence path exists, regardless of the network topology. #Ahmed: Thanks again for the text suggestion. I replaced the original text with that suggestion 167 TI-LFA is a local operation applied by the PLR when it detects 168 failure of one of its local links. As such, it does not affect: [minor] It would be welcome to explicit spell that TI-LFA is protection against a single local link failure #Ahmed: The paragraph starting with "For each destination" in Page 5 mentions that [minor] It was mentioned that TI-LFA provide protection against link and node failure. In this section the abrupt fail of a link is mentioned to trigger FRR. How is node-protection with TI-LFA achieved and the PLR triggered that neighboring node is no more operational? It is elaborated upon later in this section, but maybe a brief hint could be provided here too? #Ahmed: As you mentioned, it is already provided. IMO (and probably the opinion of others) it will be redundant to re-provide description here. 167 TI-LFA is a local operation applied by the PLR when it detects 168 failure of one of its local links. As such, it does not affect: 170 * Micro-loops that appear - or do not appear – as part of the 171 distributed IGP convergence [RFC5715] on the paths to the 172 destination that do not pass thru TI-LFA paths: 174 - As explained in [RFC5714], such micro-loops may result in the 175 traffic not reaching the PLR and therefore not following TI-LFA 176 paths. 178 * Micro-loops that appear – or do not appear - when the failed link 179 is repaired. [minor] This does not process very well. I tried reading a few times this paragraph and believe what is mentioned could be rewritten as follows: "TI-LFA operates locally at the Point of Local Repair (PLR) upon detecting a failure in one of its direct links. Consequently, this local operation does not influence: * Micro-loops that may or may not form during the distributed Interior Gateway Protocol (IGP) convergence as delineated in RFC 5715. - These micro-loops occur on routes directed towards the destination that do not traverse TI-LFA-configured paths. According to [RFC5714], the formation of such micro-loops can prevent traffic from reaching the PLR, thereby bypassing the TI-LFA paths established for rerouting. * Micro-loops that may or may not develop when the previously failed link is restored to functionality. #Ahmed: thanks again for the text. I replaced existing text with the suggested one This specification highlights that while TI-LFA effectively addresses specific link failures, it does not extend its impact to managing micro-loops associated with broader IGP convergence issues or subsequent link repairs." 181 TI-LFA paths are loop-free. What’s more, they follow the post- 182 convergence paths, and, therefore, not subject to micro-loops due to 183 difference in the IGP convergence times of the nodes thru which they 184 pass. [minor] This is a rather unformal writing style. what about the following: TI-LFA paths are inherently loop-free and align with post-convergence routes. Consequently, they are not susceptible to micro-loops that may arise due to variations in the IGP convergence times across different nodes through which these paths traverse. This ensures a stable and predictable routing environment, minimizing disruptions typically associated with asynchronous network behavior. #Ahmed: thanks again for the text. I replaced existing text with the suggested one 186 TI-LFA paths are applied from the moment the PLR detects failure of a 187 local link and until IGP convergence at the PLR is completed. [minor] readability rewrite: TI-LFA paths are activated from the instant the PLR detects a failure in a local link and remain in effect until the Interior Gateway Protocol (IGP) convergence at the PLR is fully achieved. #Ahmed: thanks again for the text. I replaced existing text with the suggested one 190 micro-loops, especially if these paths have been computed using the 191 methods described in Section Section 6.2, Section 6.3, or Section 6.4 192 of the draft. One of the possible ways to prevent such micro-loops [minor] Instead of simply referencing the sections 6.2, 6.3 and 6.4, maybe line up the conditions in which this occurs combined with the section references. This could be something in the style 'if the FRR path is not using a direct neighbor then... etc etc etc' #Ahmed: IMO this will be redundant text. The reference to the relevant sections avoids redundancy 206 For each destination in the network, TI-LFA pre-installs a backup [minor] what does destination exactly mean? is that a /32 or /128 node? or is it router-ids? any other abstraction intended? #Added the phrase "as specified by the IGP" 224 By using SR, TI-LFA does not require the establishment of TLDP 225 sessions (Targeted Label Distribution Protocol) with remote nodes in 226 order to take advantage of the applicability of remote LFAs (RLFA) 227 [RFC7490][RFC7916] or remote LFAs with directed forwarding 228 (DLFA)[RFC5714]. All the Segment Identifiers (SIDs) are available in 229 the link state database (LSDB) of the IGP. As a result, preferring 230 LFAs over RLFAs or DLFAs, as well as minimizing the number of RLFA or 231 DLFA repair nodes is not required anymore. [minor] possible rewrite for readability and simplicity: " By utilizing Segment Routing (SR), TI-LFA eliminates the need to establish Targeted Label Distribution Protocol (TLDP) sessions with remote nodes for leveraging the benefits of Remote Loop-Free Alternates (RLFA) [RFC7490][RFC7916] or Directed Loop-Free Alternates (DLFA) [RFC5714]. All the Segment Identifiers (SIDs) required are present within the Link State Database (LSDB) of the Interior Gateway Protocol (IGP). Consequently, there is no longer a necessity to prefer LFAs over RLFAs or DLFAs, nor is there a need to minimize the number of RLFA or DLFA repair nodes. #Ahmed: Thanks for the text suggestion. I replaced the original text with the suggested one " 233 By using SR, there is no need to create state in the network in order 234 to enforce an explicit FRR path. This relieves the nodes themselves 235 from having to maintain extra state, and it relieves the operator 236 from having to deploy an extra protocol or extra protocol sessions 237 just to enhance the protection coverage. [minor] what about this blob of text: " Utilizing SR makes the requirement unnecessary to establish additional state within the network for enforcing explicit Fast Reroute (FRR) paths. This alleviation spares the nodes from maintaining supplementary state and frees the operator from the necessity to implement additional protocols or protocol sessions solely to augment protection coverage. #Ahmed: Thanks for the text suggestion. I replaced the original text with the suggested one " 239 Although not a Ti-LFA requirement or constraint, TI-LFA also brings s/Ti-LFA/TI-LFA/ #Ahmed: Fixed 242 reduces the need of locally configured policies that drive the backup [minor] unsure what is meant with 'drive' means here. Would it be better to day that 'describe the backup...' #Ahmed: I used the word "influence" 243 path selection ([RFC7916]). The easiest way to express the expected 244 post-convergence path in a loop-free manner is to encode it as a list 245 of adjacency segments. However, this may create a long SID list that [major] you write 'is to encode it'. What is the 'it'? I understand this is a suggesting Adj SIDs. I also believe that simply having a list of Adj SIDs is not sufficient, but that an "ordered" list of Adj SIDs is needed. #Ahmed: A pronoun usually refers the nearest item in the sentence. The nearest item in this sentence is "the expected post-convergence path". 245 of adjacency segments. However, this may create a long SID list that 246 some hardware may not be able to push. One of the challenges of TI- [minor] should we say push or program? push seems more sr-mpls dataplane specific, while TI-LFA has applicability with SRv6 also #Ahmed: Agreed. I changed "push" to "program". 248 adjacency segments and node segments. Each implementation will be 249 free to have its own SID list optimization algorithm. This document 250 details the basic concepts that could be used to build the SR backup 251 path as well as the associated dataplane procedures possible rewrite: " Each implementation may independently develop its own algorithm for optimizing the ordered SID list. This document provides an outline of the fundamental concepts applicable to constructing the SR backup path, along with the related dataplane procedures. " #Ahmed: Thanks. Replaced the original text with the suggested one 288 We define the main notations used in this document as the following. 290 We refer to "old" and "new" topologies as the LSDB state before and 291 after the considered failure. [minor] I would like to prefer not using the word 'we'. It is undefined who that is. Is it the editor, authors, the WG the internet community, etc... #Ahmed: I am open for suggestions for replacing "we". 286 3. Terminology [minor] Would section 3 be better located before section 2 for clarity? #Ahmed: Almost all RFCs that have "terminology" section put after the "Introduction". I would rather follow that convention to avoid push back [major] Later in the document there is usage of P(S,X) and Q(D,X) while the terminology section only documents P(R,X). Maybe add some text to clarify the intended use. #Ahmed: the terminology section has "The Q-space Q(R,X) " 321 EP(P, Q) is an explicit SR-based path from a node P to a node Q. [minor] why not simply use 'SR path' instead of 'SR-based path'? does the postfix '-based' add any representative value? #Ahmed: Removed "-based" 335 An implementation is free to use any local optimization to provide 336 smaller SID lists by combining Node SIDs and Adjacency SIDs. In [minor] The intent seems to be to integrate adj SIDs and node SIDs into the SID lists. Not sure that we are combining multiple SIDs into less SIDs: "An implementation may employ any local optimization strategy to reduce the size of SID lists by integrating Node SIDs and Adjacency SIDs into the SID lists." #Ahmed: The phrase "by integrating Node SIDs and Adjacency SIDs" suggests an approach or paradigm for optimization algorithms. As mentioned in the document, this is out of the scope of this document. The current text is more general as it does not attempt to give hints 342 5. Intersecting P-Space and Q-Space with post-convergence paths 343 344 One of the challenges of defining an SR path following the expected 345 post-convergence path is to reduce the size of the segment list. In [minor] at the end of section 4 is written "These optimizations are out of scope of this document," and then the first paragraph identifies that reducing the SID lists is one of the challenges. For something that is out-of-scope of the document it is perceived as rather important though problem to address. If truly out of scope of this document, then maybe add explicit that the section 5 is all informational #Ahmed: The end of section 4 explicitly mentions that it "provides some guidance" that uses P-space and Q-space. So it clearly does not mandate the use of this guidance. [minor] in some places the term 'segment lists' is used, in others 'SID lists'. Could a single terminology be used throughout the document? #Ahmed: replaced "SID list" with "segment list" [major] In the Terminology section the P-space, extended P-space and the Q-space is explained. Not sure why all this is explained again in more explicit steps. It make me wonder if section 5 can be reduced by reusing the Terminology in section 3 and focus upon those? #Ahmed: The terminology section defines the P-space and Q-space. Section 5 explains how to P-space and Q-space nodes that are also over the post convergence path. IMO any reduction to the steps in this section will make it quite obscure. 356 We want to determine which nodes on the post-convergence path from [minor] who is 'we'? #Ahmed: Suggestions for replacing "we" are most welcomed. 358 regard to resource X (X can be a link or a set of links adjacent to 359 the PLR, or a neighbor node of the PLR). [minor] in section 3 Terminology section the document resource X was defined, but using different definition: 'resource X (e.g. a link S-F, a node F, or a SRLG)' Which one is correct? maybe reuse the Terminology definition for consistency #Ahmed: I do not see any conflict between them. This section is just providing an example of a resource X it does not define it 378 This can be found by intersecting the set of nodes belonging to the 379 post-convergence path from R to D, assuming the failure of X, with 380 Q(D, X). [minor] In terminology section 3 the Q(R, X) is described with 'R' used while in this section5.2 the term Q(D, X) has 'D' used. Is this intentional? why not add this in Terminology section also? or make the Terminology section more opaque to using any letter (e.g. 'R' or 'D') and describe the intend of the Q(...) function? #Ahmed: "X", "D", "R",..." are used the same way letters "x", "y" and "z" are used in Algebra. I do not understand what is needed here? 397 protected resource X and, at the same time, is guaranteed to be loop- 398 free irrespective of the state of FIBs along the nodes belonging to 399 the explicit path. Thus, there is no need for any co-ordination or [minor] There is assumption here that only SR programs the FIB. There may be out of Band FIB programming that does cause loops. Maybe frame the claim better by expressing the assumption made to warrant loop-free paths. #Ahmed: The beginning of the document explicitly mentioned IGP. So it is clear that other forwarding states are outside the scope of this document. 460 6.2. FRR path using a PQ node [minor] Is there a reason that there are no considerations for an implementer to select the PQ node closest to the S or closest to the D? #Ahmed: The document clearly says that it is just "suggesting" methods. You suggestion is another implementation details, which are out of scope of the document. 499 interface for the packet, S-F. The failure of the primary outgoing [minor] what is the 'F' in the S-F? #Ahmed: The text says "link S-F". Isn't it obvious that "F" is the far end of that link? 512 We define hereafter the FRR behavior applied by S for any packet 513 received with an active adjacency segment S-F for which protection 514 was enabled. As protection has been enabled for the segment S-F and 515 signaled in the IGP (for instance using protocol extensions from 516 [RFC8667] and [RFC8665]), any SR policy using this segment knows that 517 it may be transiently rerouted out of S-F in case of S-F failure. [minor] A policy is a configuration. A policy does not 'know' anything. Can the statement be made without anthropomorphism? #Ahmed: I changed it to "a calculator of any policy that uses" 637 and Q-Space as well as the post-convergence path. An implementation 638 MUST only use Node-SIDs bound to the FlexAlgo and/or Adj-SIDs that 639 are unprotected to build the repair list. [major] This is written from an sr-mpls perspective. For SRv6 the Adj is bound to an algorithm and this condition does not apply #Ahmed: Modified to mention that for SRv6, adj-sids that are bound to the flexalgo 647 S --- R2 --- R3 --- R4 --- R5 --- D 648 \ | \ / 649 R7 -- R8 650 | | 651 R9 -- R10 653 Figure 2 655 In Figure 2, all the metrics are equal to 1 except 656 R2-R7,R7-R8,R8-R4,R7-R9 which have a metric of 1000. Considering R2 [minor] The drawing here is in different style as figure 1 where - and * is used to visualize the different link metrics. Maybe consistent drawing style should be used in the document? #Ahmed: I modified R2-R7,R7-R8,R8-R4,R7-R9 to become "*" 665 To avoid the possibility of this double FRR activation, an 666 implementation of TI-LFA MAY pick only non protected adjacency 667 segments when building the repair list. However, this is important [minor] While double failures may initially sound as an exotic event, it may be more frequent as initially assumed when SRLGs are considered. In some operators multiple 'link' use the same optical cables and if one fiber gets cut, then many links may be impacted, causing double failures. Maybe worth to mention that double failures is not as rare as one may believe. #Ahmed: IMO opinion trying to make claims about the frequency of failures will result in too many objections and comments and is not relevant to the scope of the document 676 11. Advantages of using the expected post-convergence path during FRR [minor] This section is complex detailed read and seems surface level over detailed. Can the advantage description not be simplified. Is this detail necessary for this place for the document? Alternatively, consider moving this section into an appendix Consider removing anthropomorphism in this section. TI-LFA has no awareness, it may however be opaque to constraints (i.e. 'TI-LFA cannot be aware of such path constraints and' ) #Ahmed: I moved this section to Appendix 783 12. Analysis based on real network topologies [major] consider placing this section into an appendix. The shared information does not add additional considerations to the TI-LFA procedure description #Ahmed: I moved this section to Appendix
_______________________________________________ rtgwg mailing list -- rtgwg@ietf.org To unsubscribe send an email to rtgwg-le...@ietf.org