[rtgwg] Re: New Version Notification for draft-ietf-rtgwg-segment-routing-ti-lfa-18.txt

Ketan Talaulikar Fri, 15 Nov 2024 02:45:20 -0800

Hi Yingzhen/John/Ahmed/All,

I have some text suggestions for your consideration.

a) Abstract

v17
A key aspect of TI-LFA is the FRR path selection approach
establishing protection over the expected post-convergence paths from the
point of local repair, reducing the operational need to control the
tie-breaks among various FRR options.

v18
Although not a TI-LFA requirement or constraint, TI-LFA also brings
the benefit of the ability to provide a backup path that follows the
expected post-convergence path, reducing the operational need to control
the tie-breaks among various FRR options.

NEW
An *important* aspect of TI-LFA is the FRR path selection approach
establishing protection over the expected post-convergence paths from the
point of local repair, reducing the operational need to control the
tie-breaks among various FRR options.

b) sec 6.1

v18
When a direct neighbor is in P(S,X) and Q(D,x) and the link to that
direct neighbor is on the post-convergence path, the outgoing interface is
set to that neighbor and the repair segment list SHOULD be empty.

NEW
When a direct neighbor is in P(S,X) and Q(D,x) and the link to that
direct neighbor is on the post-convergence path, the outgoing interface is
set to that neighbor and the repair segment list *is *empty.

c) sec 6.2

v17
When a remote node R is in P(S,X) and Q(D,x) and on the post-convergence
path, the repair list SHOULD be made of a single node segment to R and the
outgoing interface SHOULD be set to the outgoing interface used to reach R.

v18
When a remote node R is in P(S,X) and Q(D,x) and on the post-convergence
path, the repair list can be made of a single node segment to R and the
outgoing interface set to the outgoing interface used to reach R, thereby
minimizing the size of the repair-list while keeping the repair path on
the post-convergence path.

NEW
When a remote node R is in P(S,X) and Q(D,x) and on the post-convergence
path, the repair list is made of a single node segment to R and the
outgoing interface *is *set to the outgoing interface used to reach R.

d) sec 6.3

v17
When a node P is in P(S,X) and a node Q is in Q(D,x) and both are on
the post-convergence path and both are adjacent to each other, the repair
list SHOULD be made of two segments: A node segment to P (to be processed
first), followed by an adjacency segment from P to Q.

v18
When a node P is in P(S,X) and a node Q is in Q(D,x) and both are on
the post-convergence path and both are adjacent to each other, the repair
list size can be minimized while keeping the repair path on the
post-convergence path by constructing it from two segments: A node segment
to P (to be processed first), followed by an adjacency segment from P to Q.

NEW
When a node P is in P(S,X) and a node Q is in Q(D,x) and both are on
the post-convergence path and both are adjacent to each other, the repair
list *is* made of two segments: A node segment to P (to be processed
first), followed by an adjacency segment from P to Q.

e) sec 9

v17
An implementation MAY support TI-LFA to protect Node-SIDs associated to
a FlexAlgo. In such a case, rather than computing the expected
post-convergence path based on the regular SPF, an implementation SHOULD
use the constrained SPF algorithm bound to the FlexAlgo (using the Flex
Algo Definition) instead of the regular Dijkstra in all the SPF/rSPF
computations that are occurring during the TI-LFA computation. This
includes the computation of the P-Space and Q-Space as well as the
post-convergence path. An implementation MUST only use Node-SIDs bound to
the FlexAlgo and/or Adj-SIDs that are unprotected or, in case of SRv6,
adj-SIDs that are bound to the FlexAlgo to build the repair list.

v18
An implementation MAY support TI-LFA to protect Node-SIDs associated to
a FlexAlgo. In such a case, rather than computing the expected
post-convergence path based on the regular SPF, an implementation MAY use
the constrained SPF algorithm bound to the FlexAlgo (using the Flex Algo
Definition) instead of the regular Dijkstra in all the SPF/rSPF
computations that are occurring during the TI-LFA computation. This
includes the computation of the P-Space and Q-Space as well as the
post-convergence path. If an implementation uses the constrained SPF
algorithm bound to the FlexAlgo, then the implementation MUST only use
Node-SIDs bound to the FlexAlgo and/or Adj-SIDs that are unprotected or, in
case of SRv6, adj-SIDs that are bound to the FlexAlgo to build the repair
list.

NEW
An implementation MAY support TI-LFA to protect Node-SIDs associated *with*
a Flex Algo. In such a case, rather than computing the expected
post-convergence path based on the regular SPF, an implementation *SHOULD *use
the constrained SPF algorithm bound to the Flex Algo (using the Flex Algo
Definition) instead of the regular Dijkstra in all the SPF/rSPF
computations that are occurring during the TI-LFA computation. This
includes the computation of the P-Space and Q-Space as well as the
post-convergence path. *Furthermore, the implementation SHOULD only use
Node-SIDs/Adj-SIDs bound to the Flex Algo and/or unprotected Adj-SIDs of
the regular SPF to build the repair list. The use of regular Dijkstra for
the TI-LFA computation or building of the repair path using SIDs other than
those recommended does not ensure that the traffic going over TI-LFA repair
path during the fast-reroute period is honoring the Flex Algo constraints.*

In addition to the above text change suggestions, I would remind that
strict following of post-convergence is not guaranteed by TI-LFA as it
depends on the protection scheme selected. There is the following text that
explains this scenario in Appendix A.

Readers should be aware that FRR protection is pre-computing a backup path
to protect against a particular type of failure (link, node, SRLG). When
using the post-convergence path as FRR backup path, the computed
post-convergence path is the one considering the failure we are protecting
against. This means that FRR is using an expected post-convergence path,
and this expected post-convergence path may be actually different from the
post-convergence path used if the failure that happened is different from
the failure FRR was protecting against. As an example, if the operator has
implemented a protection against a node failure, the expected
post-convergence path used during FRR will be the one considering that the
node has failed. However, even if a single link is failing or a set of
links is failing (instead of the full node), the node-protecting
post-convergence path will be used. The consequence is that the path used
during FRR is not optimal with respect to the failure that has actually
occurred.

I hope this helps get us closer to the resolution of the open issues with
this document.

Thanks,
Ketan

On Fri, Nov 15, 2024 at 8:09 AM Yingzhen Qu <yingzhen.i...@gmail.com> wrote:

> Speaking as WG member, I agree with John's comments and what Stewart and
> Sasha said at the mic, the removal of the requirement to follow
> post-convergence path is a big change. If it's not mandatory anymore, we
> need to document under what situation, post-convergence path is recommended
> and why? and the situations why it's not necessary to follow
> post-convergence path.
>
> As WG co-chair, this change should be clearly communicated with the WG. We
> need to poll the WG for consensus. If it helps, we can have an interim
> meeting to discuss and review the document.
>
> Thanks,
> Yingzhen
>
> On Thu, Nov 14, 2024 at 2:16 PM John Scudder <j...@juniper.net> wrote:
>
>> Hi Ahmed,
>>
>> Thanks for the update. I read the diff, and I listened to the recording
>> of your rtgwg presentation.
>>
>> I've written a long message. For convenience, the bottom line (TL;DR as
>> it were) is that I think the conversation that was started with Stewart and
>> Sasha at the mic line at IETF-121 needs to be worked through. Once the
>> RTGWG chairs and AD are satisfied, I'll abide by that.
>>
>> Now the long version:
>>
>> On Nov 13, 2024, at 3:01 PM, Ahmed Bashandy <abashandy.i...@gmail.com>
>> wrote:
>>
>> I uploaded version 18 of the ti-lfa draft to address the two DISCUSS
>> items in
>>
>> https://urldefense.com/v3/__https://datatracker.ietf.org/doc/draft-ietf-rtgwg-segment-routing-ti-lfa/ballot/__;!!NEt6yMaO-gk!Dhbz2cZyaM1s3x99GGj8o3EmDms0MLvUF97k-_fYkmiRooU3ofACGYnn2oXLzk8yYNKcT3uUxTjpe1x_LjTNsQ$
>>
>> - To address John Scudder's Discuss, I made the modifications to remove
>> the word "key" from the abstract as suggested by Sasha at
>>
>> https://urldefense.com/v3/__https://mailarchive.ietf.org/arch/msg/rtgwg/nWR4uYaT3T30XRiyRdAoIqO22AM/__;!!NEt6yMaO-gk!Dhbz2cZyaM1s3x99GGj8o3EmDms0MLvUF97k-_fYkmiRooU3ofACGYnn2oXLzk8yYNKcT3uUxTjpe1ytjIan9Q$
>> and Pierre at
>>
>> https://urldefense.com/v3/__https://mailarchive.ietf.org/arch/msg/rtgwg/zHP2qvP2Ew1oWl5G7Gq8niu8vy8/__;!!NEt6yMaO-gk!Dhbz2cZyaM1s3x99GGj8o3EmDms0MLvUF97k-_fYkmiRooU3ofACGYnn2oXLzk8yYNKcT3uUxTjpe1wFAV_5AQ$
>>
>> - To address Murray Discuss (as well as as comments from others) I
>> removed the word "SHOULD" from sections 6.2, 6.3, and 9 as I suggested
>> during my presentation during the rtgwg meeting last Tuesday Nov/5/24.
>> The entire recording of the RTGWG meeting can be found in
>>
>> https://urldefense.com/v3/__https://meetecho-player.ietf.org/playout/?session=IETF121-RTGWG-20241105-0930__;!!NEt6yMaO-gk!Dhbz2cZyaM1s3x99GGj8o3EmDms0MLvUF97k-_fYkmiRooU3ofACGYnn2oXLzk8yYNKcT3uUxTjpe1yu45oEfQ$
>>
>> The slides that I presented in in PDF format can be found in
>>
>> https://urldefense.com/v3/__https://datatracker.ietf.org/meeting/121/materials/slides-121-rtgwg-02-tilfa-bgppic-00.pdf__;!!NEt6yMaO-gk!Dhbz2cZyaM1s3x99GGj8o3EmDms0MLvUF97k-_fYkmiRooU3ofACGYnn2oXLzk8yYNKcT3uUxTjpe1zaUaCWtg$
>>
>>
>> Please take a look and see if the modifications are  good to address the
>> two DISCUSS Items
>>
>>
>> In your update you've gotten rid of "key". That's fine as far as it goes,
>> and I agree it resolves the inconsistency between the abstract and body.
>> But that was just an editorial issue, the canary in the coal mine as it
>> were, that illuminated the more general point. Perhaps I expressed myself
>> poorly in the DISCUSS and that's what led us down this rabbit hole of
>> focusing on the word "key". I apologize for that. My larger concern was
>> expressed very ably by Stewart in the Q&A of your presentation. Rather than
>> try to paraphrase him, I've taken the liberty of starting with the
>> transcript [1] and cleaning it up, appended below. Stewart nails it. (I
>> kept most of it as close to verbatim as I could but did remove a little bit
>> of procedural "keep it quick" stuff from the chair. This is of course not
>> an official transcript anyway.)
>>
>> To elaborate a bit, though: as far as I can tell, the contribution (and
>> it is a big contribution!) of the spec is to show how to use
>> post-convergence paths for restoration. If you remove that (which I can
>> because it's optional), it seems as though there is nothing left that
>> wasn't already specified before (for example in RFC 7490, and others).
>>
>> You mentioned in your comments at the meeting that post-convergence was
>> made optional "because some platforms cannot do it". Normally, when we have
>> a platform that can't do a specification, that's fine, the platform simply
>> wouldn't claim conformance to that specification. If you have, say, a
>> platform that can only forward based on the IPv4 or IPv6 header but not on
>> the MPLS header, you don't change the MPLS specification to say forwarding
>> on the MPLS header is not mandatory. You just don't claim conformance with
>> MPLS. (I chose an extreme case, of course, in hopes of clearly illustrating
>> the point.)
>>
>> If I were confident that the WG consensus is yes, absolutely the WG wants
>> to publish this document in its current "post-convergence is explicitly
>> optional" state, I would move from DISCUSS to ABSTAIN. I would choose
>> ABSTAIN rather than NOOBJ because of the observation above, that as far as
>> I can tell once you remove post-convergence there's nothing left that
>> hasn't been done before. (Note that ABSTAIN is a non-blocking, though also
>> non-supporting, ballot position.)
>>
>> However, it is not clear to me that this is, indeed, a solid WG
>> consensus. In addition to Stewart and Sasha's comments, you also mentioned
>> that you've gotten private emails raising the same concern. Calling
>> consensus for RTGWG isn't my job, I would defer to the chairs and AD (Jim)
>> on that point, but it sounded to me from the RTGWG meeting like this was
>> the next action.
>>
>> One last point, right at the end of the discussion of the draft you say,
>> "I avoid shoulds because of the pushback that I get. But in my opinion it
>> should be a should. [...] Either you guys want me to put it back as a
>> mandatory or say why it's not mandatory. I have a reason why it is not
>> mandatory and I just mentioned it and I can put that."
>>
>> Interestingly, this coincides closely with Murray's DISCUSS ballot, about
>> SHOULD. I get it that you have different views on the use of SHOULD, but
>> per my reading of RFC 2119 the case under discussion here is exactly the
>> kind of situation where it becomes useful. To remind us of what 2119 says:
>>
>> ```
>> 3. SHOULD   This word, or the adjective "RECOMMENDED", mean that there
>>    may exist valid reasons in particular circumstances to ignore a
>>    particular item, but the full implications must be understood and
>>    carefully weighed before choosing a different course.
>>  ```
>>
>> As far as I can tell, that is what you are saying: an implementation
>> SHOULD use the post-convergence path unless (conditions you will name,
>> e.g., "length of the SID stack is long enough, hardware cannot support
>> it"), in which case that implementation MUST fall back to (whatever the
>> right fallback posture is, RFC 7490 perhaps).
>>
>> I don't insist you use that language or even that approach, nor am I sure
>> it would satisfy the WG -- I just offer it as a point to consider.
>>
>> Thanks,
>>
>> --John
>>
>> My edited transcript:
>>
>> Stewart (17:12)
>>
>> So, Ahmed when this piece of work started, many of us have tracked this
>> piece of work since the first day it was presented at the IETF. When it was
>> presented, the word "key" was important because it was a fundamental
>> concept of the design that the repair path had to follow the
>> post-convergence path and the document kind of has that sort of subtly
>> written in, in various places, except in the places where it doesn't.
>>
>> So I think what is... what the authors need to do is to be quite clear to
>> the working group if it is no longer key, if it is no longer a mandat- a
>> requirement to follow the post-convergence path, then there needs to be an
>> explanation as to why this position has changed and then the text body
>> needs to reflect the consensus position of the working group on whether it
>> is important that it follows the "post-convergence path" or it's not
>> important or there are times when it is and times when it is not, and in
>> which case those circumstances should be documented in the text.
>>
>> Ahmed (18:21)
>>
>> So the document really says that it is not mandatory and it is important
>> and it explains why it is important like I can read part of the document
>> and I'll point them out, actually, I'll reply to your email, but the point
>> here is that we don't really try to put justifications because then I will
>> go into the details of the implementation. I just put the spec there and
>> say, you know what? It is important, but it's not mandatory. You don't have
>> to follow it. Your implementation doesn't have to follow it. If you want to
>> follow it, I have paragraphs that says how you follow it in certain
>> scenarios like that is...
>>
>> (cross-talk)
>>
>> Stewart (18:54)
>>
>> I think you're skipping the important point. The original thesis was that
>> this was a required congruence. That has been dropped, the least you need
>> to do is to explain to the working group why the requirement for congruence
>> has been changed. And then we need to decide what text needs to go in the
>> document to reflect that change of positions. But absolutely, this was a
>> fundamental of the original design and it seems to have been quietly and
>> subtly changed without explanation.
>>
>> Ahmed (19:28)
>>
>> Okay so I thought it's uh yeah I can add a statement that's why it is I
>> thought it's obvious basically because some platforms cannot do it. It's as
>> simple as that. I'll put the sentence if this is why it has been dropped.
>> This was basically a feedback that we got I can try and dig the emails it
>> has been a long while that some hardware simply cannot support it or some
>> software cannot support it if the number if the length of the SID stack is
>> long enough, hardware cannot support it so we can still do topology
>> independent which means you can still get your backup up but it will not be
>> over the post-convergence path. That is the only reason really.
>>
>> Stewart (20:08)
>>
>> I think this probably needs a longer conversation than we can have in
>> this working group and I think uh John I mean Jim probably needs to convene
>> a group of experts.
>>
>> Ahmed (20:19)
>>
>> [elided]
>>
>> Sasha (20:34)
>>
>> I just wanted to second... to say exactly what Stewart has said. I have
>> nothing to add. [Garbled] ... something is called the key aspect of a
>> feature and then called non-mandatory is not... creates a confusion to put
>> it mildly. This has to be resolved one way or another with explanations
>> because there is a loss of history behind this change of requirements. I
>> actually... I second what Stuart has said.
>>
>> Ahmed (21:13)
>>
>> Okay sure, okay, I think I got the point. So I'm open to discussions I
>> have no problem really. Okay sure.
>>
>> JeffT (21:23)
>>
>> Ahmed, do you feel we need another discussion on this? Is it clear what
>> working group is expecting from you in terms of changes and clarifications?
>>
>> Ahmed (21:31)
>>
>> Yeah, my understanding, and again, I'm talking about Stuart and Sasha's
>> comments that the original draft was... I'll have to dig it out to be
>> honest, it's been a long while... that to be TI-LFA the repair path has to
>> be post-convergence. This has been dropped from must-have to important, and
>> I avoid shoulds because of the pushback that I get. But in my opinion it
>> should be a should. But it seems like Sasha and Stuart want it back. And
>> not only Sasha and Stewart, there are other but also other [garbled]
>> exchange email privately, but because it's private I'm not going to divulge
>> their names that also think that it should be put back to mandatory and I'm
>> open to either way. Either you guys want me to put it back as a mandatory
>> or say why it's not mandatory. I have a reason why it is not mandatory and
>> I just mentioned it and I can put that. I'll discuss it with the co-authors
>> and see what they want, but I understand Stuart and Sasha's comments.
>>
>> [1]
>> https://meetecho-player.ietf.org/playout/?session=IETF121-RTGWG-20241105-0930
>>
> _______________________________________________
> rtgwg mailing list -- rtgwg@ietf.org
> To unsubscribe send an email to rtgwg-le...@ietf.org
>

_______________________________________________
rtgwg mailing list -- rtgwg@ietf.org
To unsubscribe send an email to rtgwg-le...@ietf.org

[rtgwg] Re: New Version Notification for draft-ietf-rtgwg-segment-routing-ti-lfa-18.txt

Reply via email to