Thanks Ketan for pointing out the text that clarifies that the computed ti-lfa repair path may be different from the actual post-convergence path

As for the editorial suggestions, I agree with all of them. I have incorporated these changes and attached the diffs from the latest version (version -18) in this email


Ahmed




On 11/15/24 2:44 AM, Ketan Talaulikar wrote:
Hi Yingzhen/John/Ahmed/All,

I have some text suggestions for your consideration.


a) Abstract

v17
A key aspect of TI-LFA is the FRR path selection approach establishing protection over the expected post-convergence paths from the point of local repair, reducing the operational need to control the tie-breaks among various FRR options.

v18
Although not a TI-LFA requirement or constraint, TI-LFA also brings the benefit of the ability to provide a backup path that follows the expected post-convergence path, reducing the operational need to control the tie-breaks among various FRR options.

NEW
An *_important_* aspect of TI-LFA is the FRR path selection approach establishing protection over the expected post-convergence paths from the point of local repair, reducing the operational need to control the tie-breaks among various FRR options.


b) sec 6.1

v18
When a direct neighbor is in P(S,X) and Q(D,x) and the link to that direct neighbor is on the post-convergence path, the outgoing interface is set to that neighbor and the repair segment list SHOULD be empty.

NEW
When a direct neighbor is in P(S,X) and Q(D,x) and the link to that direct neighbor is on the post-convergence path, the outgoing interface is set to that neighbor and the repair segment list *_is_ *empty.


c) sec 6.2

v17
When a remote node R is in P(S,X) and Q(D,x) and on the post-convergence path, the repair list SHOULD be made of a single node segment to R and the outgoing interface SHOULD be set to the outgoing interface used to reach R.

v18
When a remote node R is in P(S,X) and Q(D,x) and on the post-convergence path, the repair list can be made of a single node segment to R and the outgoing interface set to the outgoing interface used to reach R, thereby minimizing the size of the repair-list while keeping the repair path on the post-convergence path.

NEW
When a remote node R is in P(S,X) and Q(D,x) and on the post-convergence path, the repair list is made of a single node segment to R and the outgoing interface *_is_ *set to the outgoing interface used to reach R.


d) sec 6.3

v17
When a node P is in P(S,X) and a node Q is in Q(D,x) and both are on the post-convergence path and both are adjacent to each other, the repair list SHOULD be made of two segments: A node segment to P (to be processed first), followed by an adjacency segment from P to Q.

v18
When a node P is in P(S,X) and a node Q is in Q(D,x) and both are on the post-convergence path and both are adjacent to each other, the repair list size can be minimized while keeping the repair path on the post-convergence path by constructing it from two segments: A node segment to P (to be processed first), followed by an adjacency segment from P to Q.

NEW
When a node P is in P(S,X) and a node Q is in Q(D,x) and both are on the post-convergence path and both are adjacent to each other, the repair list *_is_* made of two segments: A node segment to P (to be processed first), followed by an adjacency segment from P to Q.


e) sec 9

v17
An implementation MAY support TI-LFA to protect Node-SIDs associated to a FlexAlgo. In such a case, rather than computing the expected post-convergence path based on the regular SPF, an implementation SHOULD use the constrained SPF algorithm bound to the FlexAlgo (using the Flex Algo Definition) instead of the regular Dijkstra in all the SPF/rSPF computations that are occurring during the TI-LFA computation. This includes the computation of the P-Space and Q-Space as well as the post-convergence path. An implementation MUST only use Node-SIDs bound to the FlexAlgo and/or Adj-SIDs that are unprotected or, in case of SRv6, adj-SIDs that are bound to the FlexAlgo to build the repair list.

v18
An implementation MAY support TI-LFA to protect Node-SIDs associated to a FlexAlgo. In such a case, rather than computing the expected post-convergence path based on the regular SPF, an implementation MAY use the constrained SPF algorithm bound to the FlexAlgo (using the Flex Algo Definition) instead of the regular Dijkstra in all the SPF/rSPF computations that are occurring during the TI-LFA computation. This includes the computation of the P-Space and Q-Space as well as the post-convergence path. If an implementation uses the constrained SPF algorithm bound to the FlexAlgo, then the implementation MUST only use Node-SIDs bound to the FlexAlgo and/or Adj-SIDs that are unprotected or, in case of SRv6, adj-SIDs that are bound to the FlexAlgo to build the repair list.

NEW
An implementation MAY support TI-LFA to protect Node-SIDs associated *_with_* a Flex Algo. In such a case, rather than computing the expected post-convergence path based on the regular SPF, an implementation *_SHOULD_ *use the constrained SPF algorithm bound to the Flex Algo (using the Flex Algo Definition) instead of the regular Dijkstra in all the SPF/rSPF computations that are occurring during the TI-LFA computation. This includes the computation of the P-Space and Q-Space as well as the post-convergence path. *_Furthermore, the implementation SHOULD only use Node-SIDs/Adj-SIDs bound to the Flex Algo and/or unprotected Adj-SIDs of the regular SPF to build the repair list. The use of regular Dijkstra for the TI-LFA computation or building of the repair path using SIDs other than those recommended does not ensure that the traffic going over TI-LFA repair path during the fast-reroute period is honoring the Flex Algo constraints._*


In addition to the above text change suggestions, I would remind that strict following of post-convergence is not guaranteed by TI-LFA as it depends on the protection scheme selected. There is the following text that explains this scenario in Appendix A.

Readers should be aware that FRR protection is pre-computing a backup path to protect against a particular type of failure (link, node, SRLG). When using the post-convergence path as FRR backup path, the computed post-convergence path is the one considering the failure we are protecting against. This means that FRR is using an expected post-convergence path, and this expected post-convergence path may be actually different from the post-convergence path used if the failure that happened is different from the failure FRR was protecting against. As an example, if the operator has implemented a protection against a node failure, the expected post-convergence path used during FRR will be the one considering that the node has failed. However, even if a single link is failing or a set of links is failing (instead of the full node), the node-protecting post-convergence path will be used. The consequence is that the path used during FRR is not optimal with respect to the failure that has actually occurred.


I hope this helps get us closer to the resolution of the open issues with this document.

Thanks,
Ketan


On Fri, Nov 15, 2024 at 8:09 AM Yingzhen Qu <yingzhen.i...@gmail.com> wrote:

    Speaking as WG member, I agree with John's comments and what
    Stewart and Sasha said at the mic, the removal of the requirement
    to follow post-convergence path is a big change. If it's not
    mandatory anymore, we need to document under what situation,
    post-convergence path is recommended and why? and the situations
    why it's not necessary to follow post-convergence path.

    As WG co-chair, this change should be clearly communicated with
    the WG. We need to poll the WG for consensus. If it helps, we can
    have an interim meeting to discuss and review the document.

    Thanks,
    Yingzhen

    On Thu, Nov 14, 2024 at 2:16 PM John Scudder <j...@juniper.net> wrote:

        Hi Ahmed,

        Thanks for the update. I read the diff, and I listened to the
        recording of your rtgwg presentation.

        I've written a long message. For convenience, the bottom line
        (TL;DR as it were) is that I think the conversation that was
        started with Stewart and Sasha at the mic line at IETF-121
        needs to be worked through. Once the RTGWG chairs and AD are
        satisfied, I'll abide by that.

        Now the long version:

        On Nov 13, 2024, at 3:01 PM, Ahmed Bashandy
        <abashandy.i...@gmail.com> wrote:

        I uploaded version 18 of the ti-lfa draft to address the two
        DISCUSS
        items in
        
https://urldefense.com/v3/__https://datatracker.ietf.org/doc/draft-ietf-rtgwg-segment-routing-ti-lfa/ballot/__;!!NEt6yMaO-gk!Dhbz2cZyaM1s3x99GGj8o3EmDms0MLvUF97k-_fYkmiRooU3ofACGYnn2oXLzk8yYNKcT3uUxTjpe1x_LjTNsQ$

        - To address John Scudder's Discuss, I made the modifications
        to remove
        the word "key" from the abstract as suggested by Sasha at
        
https://urldefense.com/v3/__https://mailarchive.ietf.org/arch/msg/rtgwg/nWR4uYaT3T30XRiyRdAoIqO22AM/__;!!NEt6yMaO-gk!Dhbz2cZyaM1s3x99GGj8o3EmDms0MLvUF97k-_fYkmiRooU3ofACGYnn2oXLzk8yYNKcT3uUxTjpe1ytjIan9Q$
        and Pierre at
        
https://urldefense.com/v3/__https://mailarchive.ietf.org/arch/msg/rtgwg/zHP2qvP2Ew1oWl5G7Gq8niu8vy8/__;!!NEt6yMaO-gk!Dhbz2cZyaM1s3x99GGj8o3EmDms0MLvUF97k-_fYkmiRooU3ofACGYnn2oXLzk8yYNKcT3uUxTjpe1wFAV_5AQ$

        - To address Murray Discuss (as well as as comments from
        others) I
        removed the word "SHOULD" from sections 6.2, 6.3, and 9 as I
        suggested
        during my presentation during the rtgwg meeting last Tuesday
        Nov/5/24.
        The entire recording of the RTGWG meeting can be found in
        
https://urldefense.com/v3/__https://meetecho-player.ietf.org/playout/?session=IETF121-RTGWG-20241105-0930__;!!NEt6yMaO-gk!Dhbz2cZyaM1s3x99GGj8o3EmDms0MLvUF97k-_fYkmiRooU3ofACGYnn2oXLzk8yYNKcT3uUxTjpe1yu45oEfQ$

        The slides that I presented in in PDF format can be found in
        
https://urldefense.com/v3/__https://datatracker.ietf.org/meeting/121/materials/slides-121-rtgwg-02-tilfa-bgppic-00.pdf__;!!NEt6yMaO-gk!Dhbz2cZyaM1s3x99GGj8o3EmDms0MLvUF97k-_fYkmiRooU3ofACGYnn2oXLzk8yYNKcT3uUxTjpe1zaUaCWtg$


        Please take a look and see if the modifications are  good to
        address the
        two DISCUSS Items

        In your update you've gotten rid of "key". That's fine as far
        as it goes, and I agree it resolves the inconsistency between
        the abstract and body. But that was just an editorial issue,
        the canary in the coal mine as it were, that illuminated the
        more general point. Perhaps I expressed myself poorly in the
        DISCUSS and that's what led us down this rabbit hole of
        focusing on the word "key". I apologize for that. My larger
        concern was expressed very ably by Stewart in the Q&A of your
        presentation. Rather than try to paraphrase him, I've taken
        the liberty of starting with the transcript [1] and cleaning
        it up, appended below. Stewart nails it. (I kept most of it as
        close to verbatim as I could but did remove a little bit of
        procedural "keep it quick" stuff from the chair. This is of
        course not an official transcript anyway.)

        To elaborate a bit, though: as far as I can tell, the
        contribution (and it is a big contribution!) of the spec is to
        show how to use post-convergence paths for restoration. If you
        remove that (which I can because it's optional), it seems as
        though there is nothing left that wasn't already specified
        before (for example in RFC 7490, and others).

        You mentioned in your comments at the meeting that
        post-convergence was made optional "because some platforms
        cannot do it". Normally, when we have a platform that can't do
        a specification, that's fine, the platform simply wouldn't
        claim conformance to that specification. If you have, say, a
        platform that can only forward based on the IPv4 or IPv6
        header but not on the MPLS header, you don't change the MPLS
        specification to say forwarding on the MPLS header is not
        mandatory. You just don't claim conformance with MPLS. (I
        chose an extreme case, of course, in hopes of clearly
        illustrating the point.)

        If I were confident that the WG consensus is yes, absolutely
        the WG wants to publish this document in its current
        "post-convergence is explicitly optional" state, I would move
        from DISCUSS to ABSTAIN. I would choose ABSTAIN rather than
        NOOBJ because of the observation above, that as far as I can
        tell once you remove post-convergence there's nothing left
        that hasn't been done before. (Note that ABSTAIN is a
        non-blocking, though also non-supporting, ballot position.)

        However, it is not clear to me that this is, indeed, a solid
        WG consensus. In addition to Stewart and Sasha's comments, you
        also mentioned that you've gotten private emails raising the
        same concern. Calling consensus for RTGWG isn't my job, I
        would defer to the chairs and AD (Jim) on that point, but it
        sounded to me from the RTGWG meeting like this was the next
        action.

        One last point, right at the end of the discussion of the
        draft you say, "I avoid shoulds because of the pushback that I
        get. But in my opinion it should be a should. [...] Either you
        guys want me to put it back as a mandatory or say why it's not
        mandatory. I have a reason why it is not mandatory and I just
        mentioned it and I can put that."

        Interestingly, this coincides closely with Murray's DISCUSS
        ballot, about SHOULD. I get it that you have different views
        on the use of SHOULD, but per my reading of RFC 2119 the case
        under discussion here is exactly the kind of situation where
        it becomes useful. To remind us of what 2119 says:

        ```
        3. SHOULD   This word, or the adjective "RECOMMENDED", mean
        that there
           may exist valid reasons in particular circumstances to ignore a
           particular item, but the full implications must be
        understood and
           carefully weighed before choosing a different course.
         ```

        As far as I can tell, that is what you are saying: an
        implementation SHOULD use the post-convergence path unless
        (conditions you will name, e.g., "length of the SID stack is
        long enough, hardware cannot support it"), in which case that
        implementation MUST fall back to (whatever the right fallback
        posture is, RFC 7490 perhaps).

        I don't insist you use that language or even that approach,
        nor am I sure it would satisfy the WG -- I just offer it as a
        point to consider.

        Thanks,

        --John

        My edited transcript:

        Stewart (17:12)

        So, Ahmed when this piece of work started, many of us have
        tracked this piece of work since the first day it was
        presented at the IETF. When it was presented, the word "key"
        was important because it was a fundamental concept of the
        design that the repair path had to follow the post-convergence
        path and the document kind of has that sort of subtly written
        in, in various places, except in the places where it doesn't.

        So I think what is... what the authors need to do is to be
        quite clear to the working group if it is no longer key, if it
        is no longer a mandat- a requirement to follow the
        post-convergence path, then there needs to be an explanation
        as to why this position has changed and then the text body
        needs to reflect the consensus position of the working group
        on whether it is important that it follows the
        "post-convergence path" or it's not important or there are
        times when it is and times when it is not, and in which case
        those circumstances should be documented in the text.

        Ahmed (18:21)

        So the document really says that it is not mandatory and it is
        important and it explains why it is important like I can read
        part of the document and I'll point them out, actually, I'll
        reply to your email, but the point here is that we don't
        really try to put justifications because then I will go into
        the details of the implementation. I just put the spec there
        and say, you know what? It is important, but it's not
        mandatory. You don't have to follow it. Your implementation
        doesn't have to follow it. If you want to follow it, I have
        paragraphs that says how you follow it in certain scenarios
        like that is...

        (cross-talk)

        Stewart (18:54)

        I think you're skipping the important point. The original
        thesis was that this was a required congruence. That has been
        dropped, the least you need to do is to explain to the working
        group why the requirement for congruence has been changed. And
        then we need to decide what text needs to go in the document
        to reflect that change of positions. But absolutely, this was
        a fundamental of the original design and it seems to have been
        quietly and subtly changed without explanation.

        Ahmed (19:28)

        Okay so I thought it's uh yeah I can add a statement that's
        why it is I thought it's obvious basically because some
        platforms cannot do it. It's as simple as that. I'll put the
        sentence if this is why it has been dropped. This was
        basically a feedback that we got I can try and dig the emails
        it has been a long while that some hardware simply cannot
        support it or some software cannot support it if the number if
        the length of the SID stack is long enough, hardware cannot
        support it so we can still do topology independent which means
        you can still get your backup up but it will not be over the
        post-convergence path. That is the only reason really.

        Stewart (20:08)

        I think this probably needs a longer conversation than we can
        have in this working group and I think uh John I mean Jim
        probably needs to convene a group of experts.

        Ahmed (20:19)

        [elided]

        Sasha (20:34)

        I just wanted to second... to say exactly what Stewart has
        said. I have nothing to add. [Garbled] ... something is called
        the key aspect of a feature and then called non-mandatory is
        not... creates a confusion to put it mildly. This has to be
        resolved one way or another with explanations because there is
        a loss of history behind this change of requirements. I
        actually... I second what Stuart has said.

        Ahmed (21:13)

        Okay sure, okay, I think I got the point. So I'm open to
        discussions I have no problem really. Okay sure.

        JeffT (21:23)

        Ahmed, do you feel we need another discussion on this? Is it
        clear what working group is expecting from you in terms of
        changes and clarifications?

        Ahmed (21:31)

        Yeah, my understanding, and again, I'm talking about Stuart
        and Sasha's comments that the original draft was... I'll have
        to dig it out to be honest, it's been a long while... that to
        be TI-LFA the repair path has to be post-convergence. This has
        been dropped from must-have to important, and I avoid shoulds
        because of the pushback that I get. But in my opinion it
        should be a should. But it seems like Sasha and Stuart want it
        back. And not only Sasha and Stewart, there are other but also
        other [garbled] exchange email privately, but because it's
        private I'm not going to divulge their names that also think
        that it should be put back to mandatory and I'm open to either
        way. Either you guys want me to put it back as a mandatory or
        say why it's not mandatory. I have a reason why it is not
        mandatory and I just mentioned it and I can put that. I'll
        discuss it with the co-authors and see what they want, but I
        understand Stuart and Sasha's comments.

        [1]
        
https://meetecho-player.ietf.org/playout/?session=IETF121-RTGWG-20241105-0930

    _______________________________________________
    rtgwg mailing list -- rtgwg@ietf.org
    To unsubscribe send an email to rtgwg-le...@ietf.org


_______________________________________________
rtgwg mailing list --rtgwg@ietf.org
To unsubscribe send an email tortgwg-le...@ietf.org

<<< text/html; charset=UTF-8; name="draft-ietf-rtgwg-segment-routing-ti-lfa-19.diff.html": Unrecognized >>>
_______________________________________________
rtgwg mailing list -- rtgwg@ietf.org
To unsubscribe send an email to rtgwg-le...@ietf.org

Reply via email to