John Scudder has entered the following ballot position for draft-ietf-rtgwg-segment-routing-ti-lfa-13: Discuss
When responding, please keep the subject line intact and reply to all email addresses included in the To and CC lines. (Feel free to cut this introductory paragraph, however.) Please refer to https://www.ietf.org/about/groups/iesg/statements/handling-ballot-positions/ for more information about how to handle DISCUSS and COMMENT positions. The document, along with other ballot positions, can be found here: https://datatracker.ietf.org/doc/draft-ietf-rtgwg-segment-routing-ti-lfa/ ---------------------------------------------------------------------- DISCUSS: ---------------------------------------------------------------------- # John Scudder, RTG AD, comments for draft-ietf-rtgwg-segment-routing-ti-lfa-13 CC @jgscudder Thanks for this document. The technology is valuable, and the underlying techniques sound. Despite what you might guess from my abundance of comments, I like this document. I suspect it has suffered from having been edited repeatedly by the same set of experts, it becomes hard to see it as a new reader might. So, please accept my comments in the spirit of a new set of eyes looking over some long-established text. Although most of my remarks are non-blocking COMMENTs, I am putting two in as DISCUSS points. Some of my other comments are of a similar nature, but I think they're less serious for the overall coherence of the document. ## DISCUSS ### Whole document, is post-convergence of the essence, or not? The document seems to be arguing with itself about whether following the post-convergence path is, or is not an essential/required feature. In the Abstract: A key aspect of TI-LFA is the FRR path selection approach establishing protection over the expected post-convergence paths from the point of local repair It's a *key* aspect! OK! But then, in the Introduction: Although not a Ti-LFA requirement or constraint, TI-LFA also brings the benefit of the ability to provide a backup path that follows the expected post-convergence path Wait, it's "key" but "not a requirement or constraint"? Moving on to Section 6, The repair list encodes the explicit post-convergence path to the destination So it "encodes the explicit post-convergence path". "Encodes", not "might encode" or "can encode". So the Abstract is right and Section 2 is wrong. But wait there's more! Later in Section 11, traffic can be steered by the PLR onto its expected post-convergence path during the FRR phase So it "can"... which implies "doesn't have to be". There's more, for example, all of Section 5 talks about post-convergence paths, and there are many more mentions in Section 6 too. Given that Sections 5-8 seem to come closest to being the normative ones (though the document is sadly not very precise in this regard) I'm left with the impression that the Abstract is right ("key"), and the quoted passages of Sections 2 and 11 are wrong. In any case, I think this needs to be resolved in some way. ### Section 10, multiple unrelated failures Implementations of TI-LFA should deal with the occurence of multiple unrelated failures in accordance to the IP Fast Reroute Framework [RFC5714]. (Nit, you misspelled "occurrence".) Can you explain what you mean by this sentence? I haven't reviewed it carefully but RFC 5714 is a framework, and I don’t know what it means to be "in accordance to" it. The only relevant text I was able to find in it was RFC 5714 Section 5.2.6, However, it is important that the occurrence of a second failure while one failure is undergoing repair should not result in a level of service which is significantly worse than that which would have been achieved in the absence of any repair strategy. Putting that together with the quote from your specification, I come up with an interpretation like "it's important to behave reasonably in the face of multiple failures, we aren't going tell you how to do it, this other document we are citing isn't going to tell you how to do it either, it's just going to tell you that our specification was supposed to cover this, but we didn't.” ---------------------------------------------------------------------- COMMENT: ---------------------------------------------------------------------- ## COMMENTS ### General Gripe, Y U no XML? It looks like you must have uploaded the TXT rendering instead of the XML source. Whenever it is you upload your next revision, please consider uploading the XML source instead. The set of renderings available when you upload text is inferior to the renderings available when you upload source (notably, the modern HTML rendering is not available). If there's some reason uploading source is difficult or impossible for you, disregard this request of course. (The only reason I can think of for it to be hard is if you used a less-common tool to produce your draft, for example, nroff or the Microsoft Word template. If you use the more mainstream XML or MD workflows, it should be easy enough.) ### Abstract, for want of a hyphen the meaning was lost It extends these concepts to provide guaranteed coverage in any two connected networks using a link-state IGP. The only way I can make sense of this is to add a hyphen: It extends these concepts to provide guaranteed coverage in any two-connected networks using a link-state IGP. Is that what you meant? I would also suggest changing "using" to "that uses", as in, It extends these concepts to provide guaranteed coverage in any two-connected network that uses a link-state IGP. I'm not sure the concept of two-connectedness is universally understood enough that it's suitable for use in an Abstract, so I'd support further editing to get rid of the graph theory term entirely, but at least this edit makes it correct. ### Section 1, orphan definitions - RSPT is defined, but never used. Delete? - SLA is defined, only used once (and in a comment below I suggest deleting that use!). Delete and just expand on first use, if the first-and-only use is kept at all? - SPT is defined, but never used. Delete? - SRGB is defined, only used once. Delete and just expand on first use? - TLDP is defined, only used once. Delete and just expand on first use? (If you keep it, please correct the nit from Ben Niven-Jenkins’ RTGAREA review.) ### Section 2, "aims at" Segment Routing aims at supporting services with tight SLA guarantees [RFC8402]. This seems to be rewriting history. That is to say, sure SR “aims at supporting services with tight SLA guarantees”, but only in the sense that every general-purpose packet transport does; without specifics this isn’t very meaningful. In any case, the citation doesn’t supply evidence for the statement, indeed the string “SLA” or “service level” never occurs in RFC 8402. Maybe just remove this sentence? It doesn’t appear essential in any case. ### Section 2, can't parse this sentence By relying on SR this document provides a local repair mechanism for standard link-state IGP shortest path capable of restoring end-to-end connectivity in the case of a sudden directly connected failure of a network component. I can’t parse this sentence. I am *guessing* that what you mean is something like, NEW: This document leverages Segment Routing (SR) to provide a local repair mechanism for a shortest path computed by a standard link-state IGP. The local repair is capable of restoring end-to-end connectivity in the case of a failure of a directly connected network component. If this is what you mean, you’re welcome to use this text if you want. If it’s not what you mean, please explain? Note I removed “sudden” in my NEW text because I’m guessing you don’t mean to exclude a gradual or a foreseeable failure. A failure is a failure, after all. ### Section 2, When the network reconverges When the network reconverges I suggest, NEW: When the network reconverges after a failure See also the next comment. ### Section 2, microloops Much of Section 2 is a discussion of microloops and exactly how TI-LFA relates to them. This doesn’t seem like introductory material, especially because the rest of the specification doesn't talk about microloops at all. I suggest moving that material to a new (sub)section for that purpose and just mentioning it in the Introduction, as in something like "microloops are not addressed by TI-LFA and can be a concern in some deployments. This is discussed in <xref>." I don't insist that you make this change, the document is still usable without it. However, I think that the Introduction as it stands is not very useful *as an introduction* because of the abundance of non-introductory material. (Some of my subsequent comments fall under the same heading.) ### Section 2, “primary link” You mention the “primary link” in a few places here (and nowhere else). What is the “primary link”? Please clarify or re-word. For example, maybe you mean “the link whose failure is detected”. ### Section 2, what value does comparing to older FRR techniques add to an intro? By using SR, TI-LFA does not require the establishment of TLDP sessions (Targeted Label Distribution Protocol) with remote nodes in order to take advantage of the applicability of remote LFAs (RLFA) [RFC7490][RFC7916] or remote LFAs with directed forwarding (DLFA)[RFC5714]. All the Segment Identifiers (SIDs) are available in the link state database (LSDB) of the IGP. As a result, preferring LFAs over RLFAs or DLFAs, as well as minimizing the number of RLFA or DLFA repair nodes is not required anymore. I see why you wanted this paragraph during the process of developing this spec and persuading the WG of its value. I don’t see how it contributes any value to the final spec. Similarly, By using SR, there is no need to create state in the network in order to enforce an explicit FRR path. This relieves the nodes themselves from having to maintain extra state, and it relieves the operator from having to deploy an extra protocol or extra protocol sessions just to enhance the protection coverage. Appears to just be a restatement of the value proposition of SR itself. I don’t see value in restating it in this document. I think you could remove both these paragraphs without harm, and it would make the document a quicker and clearer read. ### Section 2, encoding challenges One of the challenges of TI- LFA is to encode the expected post-convergence path by combining adjacency segments and node segments. Do you mean “compactly encode”, “efficiently encode”, or similar? Is encoding, per se, the challenge? ### Section 2, roadmap should be a roadmap I agree with Ben Niven-Jenkins that the omission of Sections 8-10 in the overview list at the end of the Introduction is a bit jarring to the reader. (I would add Section 13, too.) ### Section 3, exclude from that set I think this is wrong: Exclude from that set of neighbors that are reachable from R using X. Did you mean, NEW: Exclude from that set, the neighbors that are reachable from R using X. ### Section 3, defined but not used A symmetric network is a network such that the IGP metric of each link is the same in both directions of the link. This definition is never used. ### Section 6, you can't guarantee that is guaranteed to be loop- free irrespective of the state of FIBs along the nodes belonging to the explicit path. As written, there’s no way to guarantee that. (Trivial proof, one possible state of a FIB is to point back to the preceding node along the path. That might not be an *expected* state, but it is *a* state.) I think this is just a case of overly casual writing, and you mean that the loop-free property will exist regardless of whether the nodes belonging to the explicit path have converged to recognize failure X or not. Consider rewriting along those lines? ### Section 6 and others Please supply definitions for “P node” and “Q node”. ### Section 6, terminology is inconsistent and unclear I can understand NodeSID(R1) well enough even though you haven't supplied a definition, it’s the node SID for router R1. But what is “Node_SID(P)”? P isn’t a router, it’s a set of routers, or a space. Please clarify, whether with a definition or otherwise. Probably your clarification will fix both this and the previous point. While you’re at it, you might as well make your terminology consistent. In one of the node SID cases above you use an underscore, and in the other, you don’t. Also, your node SID notation is inconsistent with your adjacency SID notation which looks like AdjSID_R1R2 — so in one case an underscore and parentheses, in another case parentheses with no underscore, and in the final case an underscore with no parentheses. Pick one. (And then there's Section 12 with "node-SID"...) ### Sections 6.1, 6.2, 6.3, SHOULD NOT use SHOULD What work are the SHOULDs doing here? Considering that in other parts of the document you leave the computation of the repair path up to the implementation, why are you mandating it here? And, if you’re mandating it, why not mandate it all the way with a MUST? It seems to me that if you don’t want to mandate implementation, it would be sensible to take the RFC 2119 language out altogether. If you do want to mandate implementation, I don’t see why you wouldn’t make this a MUST. If you really do want it to be SHOULD, Please explain your reasoning. ### Section 7, primary outgoing interface The existence of a "primary outgoing interface" seems to imply the existence of a secondary outgoing interface, tertiary outgoing interface, etc. Please define primary outgoing interface, or if this isn't an important distinction, consider whether you can simplify to just say "outgoing interface". ### Section 7.1, first The active segment becomes the first segment of the repair list. By “first” do you mean “first to be pushed, last to be processed”? If so, I suggest clarifying that in the text, because the plain English reading of "first" is the opposite. ### Section 7.2, "as stated"... where? As stated in Section 2, when SR policies are involved and a strict compliance of the policy is required, an end-to-end protection should be preferred over a local repair mechanism. I don’t see this in section 2 (I searched for “end-to-end”, “prefer”, “policy” and “policies”). Can you help me understand what text in section 2 you’re talking about? ### Section 7.2.1, what's an Adj()? Please define Adj(). Is this yet another terminology variation? (c.f. Section 6 comment on AdjSID_R1R2) ### Section 8.1, what's the tail end of a node segment? 1. If the active segment is a node segment that has been signaled with penultimate hop popping and the repair list ends with an adjacency segment terminating on the tail-end of the active segment, then the active segment MUST be popped before pushing the repair list. What is the tail end of a node segment? I can’t figure out what that means. I think I know what you’re trying to say, but please find a way to reword it that doesn’t end up making me try to parse the above with my "what did the authors actually *mean*?" glasses on. ### Section 8.2 and others, description of SRv6 behaviors RFC 8754 describes forwarding behaviors using a kind of line-numbered pseudocode, and later documents that modify forwarding behaviors specify updates to the pseudocode. (Examples: RFC 8986, draft-ietf-spring-srv6-srh-compression-15, draft-ietf-rtgwg-srv6-egress-protection-16, draft-ietf-spring-sr-redundancy-protection-03, draft-ietf-spring-srv6-path-segment-07) You don’t do this, you use a descriptive approach instead. I'm ok with this in isolation, but I’d like to know if you made an affirmative decision to diverge from the usual SRv6 way of doing things, and if so, why, and if the SPRING working group specifically considered this and is OK with it. ### Section 8.2, shorter than what? In such case, there is no need for a preceding Prefix SID and the resulting repair list is likely shorter. Shorter than what? This is the first place in this document the string “Prefix SID” occurs, so I’m confused. ### Section 11, limit the implementation of local FRR policies Based on this assumption, in order to facilitate the operation of FRR, and limit the implementation of local FRR policies Do you mean "limit the need for implementation of local FRR policies"? (And you could drop “implementation of” for that matter.) ### Section 11, TI-LFA and SR policies don't mix? The last paragraph, regarding the use of SR policies, and also Section 9, leaves me wondering whether a simpler statement would be that TI-LFA is inappropriate for use in a network that makes use of SR policies. Is this a fair characterization? ### Section 12, can't this be an appendix? Shouldn’t this be an Appendix? In general, that seems like a common (and good!) practice for inessential information like this, especially when it has a potentially limited shelf-life. Also, although I appreciate that you provided some rudimentary parameterization of the topologies in Table 1, I think it would be helpful to at least say what time period the topologies reflect — draft-francois-rtgwg-segment-routing-ti-lfa-00 dates to summer of 2015; are we talking about the topologies that were in vogue in 2015? Those of 2023? Etc. ### Section 12, granularity wut We do not cover the case for 2 SIDs (Section 6.3) separately because there was no granularity in the result. I don’t understand what this means. Can you rephrase it? Generally, I find the words “granular”. “granularity”, and related have almost zero descriptive power. :-( ### Section 12, "2 or more" or "2", "3", and no more? In your description of the table, you say, The convention that we use is as follows * 0 SIDs: the calculated repair path starts with a directly ... * 1 SIDs: the repair node is a PQ node, in which case only 1 SID is ... * 2 or more SIDs: The repair path consists of 2 or more SIDs as described in Section 6.3 and Section 6.4. We do not cover the case for 2 SIDs (Section 6.3) separately ... But the table headers show: +-------------+------------+------------+------------+------------+ | Network | 0 SIDs | 1 SID | 2 SIDs | 3 SIDs | +-------------+------------+------------+------------+------------+ I.e. the table headers don’t show “2 or more” they show 2, and 3, broken out distinctly, and no "or more" case. Seems like these need to be reconciled. ### Section 13, guaranteed upper bound The techniques described in this document are internal functionalities to a router that result in the ability to guarantee an upper bound on the time taken to restore traffic flow upon the failure of a directly connected link or node. This is the only place in the document where you talk about guaranteed upper bound. This is a fairly strong promise to make, I think you shouldn't be mentioning it unless you provide some kind of support for how the guarantee is provided. Note, I don't question that TI-LFA can be part of the machinery providing such a guarantee, but without showing your work I don't think you can make this claim. ## Notes This review is in the ["IETF Comments" Markdown format][ICMF], You can use the [`ietf-comments` tool][ICT] to automatically convert this review into individual GitHub issues. [ICMF]: https://github.com/mnot/ietf-comments/blob/main/format.md [ICT]: https://github.com/mnot/ietf-comments _______________________________________________ rtgwg mailing list rtgwg@ietf.org https://www.ietf.org/mailman/listinfo/rtgwg