Re: Request for RTGWG Working Group adoption for draft-bashandy-rtgwg-segment-routing-ti-lfa

Stewart Bryant Tue, 10 Jul 2018 05:41:16 -0700


On 09/07/2018 20:53, Ahmed Bashandy wrote:

Thanks for the comments
See the reply inline at #Ahmed

Ahmed

On 5/29/18 3:35 AM, Alexander Vainshtein wrote:
Robert, Chris and all,
I agree with Robert that it is up to the authors of an individualsubmission what they consider in or out of scope of the draft.
However, I agree with Chris that the authors of an individual draftasking for its adoption by a specific WG should do their best toaddress the comments they have received from the WG members.
#Ahmed: Thanks a lot


SB> I am not sure what that reply means.

From my POV this did not happen in the case of the draft in question– for the following reasons:
1.In his early RTG-DIR review<https://datatracker.ietf.org/doc/review-bashandy-rtgwg-segment-routing-ti-lfa-00-rtgdir-early-bryant-2017-05-31/>of the draft Stewart has pointed to the following issues with the-00 version of the draft (needless to say, I defer to Stewartregarding resolution of these issues):
a.*No IPR disclosures for draft-bashandy*in spite of 3 IPRdisclosures for its predecessor draft-francois. I have not seen anyattempts to address this issue – at least, search of IPR disclosuresfor draft-bashandy did not yield any results today
#Ahmed: Since draft-bashandy inherits draft-francois, then the IPR ofthe latter applies to the former. But if there is a spec that requiresre-attaching the IPR of an inherited draft to the inheriting draft, itwould be great to point it out or point out some other draft wherethat occurred so that I can follow the exact procedure.

SB> There is (at least now) an IPR disclosure. Hopefully this covers allof the applicable IPR. We should move on.

b.*Selecting the post-convergence path *(inheritance fromdraft-francois) does not provide for any benefits for traffic thatwill not pass via the PLR */after convergence/*.
i.The authors claim to have addressed this issue by stating that“Protection applies to traffic which traverses the Point of LocalRepair (PLR). Traffic which does NOT traverse the PLR remainsunaffected.”

SB> It is not as simple as that, and I think that the draft needs toprovide greater clarity.


I think there will be better examples, but consider

              12
      +--------------+
      |              |
A-----B-----C---//---D----E
        10  | |
            F--------G

Traffic injected at C will initially go C-D-E at cost 2, will berepaired C-F-G-D-E at cost 4 and will remain on that path postconvergence. This congruence of path is what TI-LFA claims.

However, a long standing concern about traffic starting further back inthe network needs to be more clearly addressed in the draft to clearlydemonstrate the scope of applicability.


For traffic starting at A, before failure the path is A-B-C-D-E cost 13

TI-LFA will repair to make the path A-B-C-F-G-D-E cost 15 because TI-LFAoptimises based on local repairs computed at C.


After repair the path will be A-B-D-E cost 14.

So the draft needs to make it clear to the reader that TI-LFA onlyprovides benefit to traffic which traverses the PLR before and afterfailure. Traffic which does not pass through the PLR after the failurewill need to be traffic engineered separately from traffic that passesthough the PLR in both cases.

ii.From my POV this is at best a misleading statement because it doesnot really address Stewart’s comment which was about traffic that*/traversed the PLR before convergence/* but */would not traverse itafter convergence/*.
iii.This is not a fine distinction: actually it indicates thatselecting post-convergence path for repair is more or less useless(unless the traffic originates at the PLR).
#Ahmed: Thanks for pointing out this *additional benefit* of providinga post-convergence back path. If a flow starts to use the PLR after afailure, then the presence of a post convergence backup path on thePLR extends the benefits of using the post-convergence path to flowsthat did not use the PLR prior the topology change. I will modify thestatement in the introduction to indicate that :)

SB> Sorry, I don't understand that point, please could you provide anetwork fragment that illustrates the advantage?

c.*Selecting the post-convergence path is detrimental to scalabilityof the solution*. Please note that in RFC 7490<https://tools.ietf.org/html/rfc7490> “the Q-space of E with respectto link S-E is used as a proxy for the Q-space of each destination” in order to provide a scalable solution – but this clearly is notthe case of draft-bashandy if post-convergence paths are used. To thebest of my understanding, the authors did not, so far, do anything toaddress this comment.
#Ahmed: I fail to see why there is a scalability problem.draft-bashandy-rtgwg-segment-routing-ti-lfa just prefers particularnode(s) in the Q space if that node(s) is (are) along the postconvergence path.But if I am missing something, it would be great to point out exactlywhat aspect of scalability you are concerned about so that we canaddress it

SB> The point is that traffic is constrained to a subset of Q-space onthe false assumption that it needs to traverse that part of Q-space postconvergence. The network fragment above clearly illustrates the point.


<snip>

>

        > The work has four basic components, the concept of resolving the

        > problem of P and Q being non-adjacent, the use of SR to solve the

        > non-adjacency, the use of the post convergence path following failure

        > and the applicability of these techniques to an SR network. The first

        > and second points seem of utility in non-SR networks, and so I am

        > surprised that they are not called out as such, in the first case

        > perhaps with consideration to strategically places RSVP tunnels, or

        > binding segments.

        The draft already mentions that the work builds on top of existing FRR 
work. For example

        the second statement of the abstract already says

           builds on proven IP-FRR concepts being

           LFAs, remote LFAs (RLFA), and remote LFAs with directed forwarding

           (DLFA).

        The statement about the possibility of using RSVP is clearly outside 
the scope of document as mentioned in first paragraph of the introduction.

SB> In the introduction you set out the context of the work so I don'tthink that RSVP shouldbe dismissed so lightly since what you are doing is pretty much what anRSVP tunnelrepair would do, including taking into account the traffic volumes.Setting out why thecomplexity of this repair technique outweighs the complexity of RSVP ispart of thedeployment consideration process, and you ought to provide a balancedview so that

the reader can evaluate the issues for themselves.

>

        > The issue of mapping repair path to the post convergence path to the

        > something that has always concerned me in this concept. It is true

        > that traffic that always passes through the PLR will experience the

        > properties the authors describe, but not all traffic will pass through

        > the PLR post convergence. The post failure path will be topology

        > dependent, and may take a different path from the point of ingress.

        #Ahmed

        The fourth paragraph in the introduction clearly mentions that we are 
protecting the traffic passing through the PLR.

.... See above. It only applies to a subset of that traffic.

>

        > I am also concerned that the authors do not discuss the need for loop

        > free convergence, since although traffic going through the repair path

        > will be loop-free, traffic arriving at the PLR might not be.. Consider

        > for example a topology fragment that looks like a clock with a router

        > at each minute. Traffic enters at 9 o'clock, leave at 3 o'clock and

        > goes via 12 o'clock and 12 o'clock fails.  The routers 9..12 will

        > re-converge at different times and this may give rise to the

        > micro-looping of traffic trying to get to the PLR. A summary of the

        > problem and a pointer to the companion draft may be sufficient.

        #Ahmed

        The last statement in the first paragraph in the introduction refers 
the reader to the uloop avoidance draft which handles non-local failures

SB> The question I have had for a long time, is whether traffic patternsare such that deployment of thismakes any sense without co-deployment of the uloop draft. Theimplication as always been that itdoes, but that puts the onus on the operator to validate that positionagainst all first and probably all

second failures for each evolution of their topology.

>

        > There is no discussion of multiple failures, nor as far as I can see

        > of failures that are worse than anticipated. This is an important

        > point that needs to be established early. Some methods, (MRT)

        > intrinsically address multiple failures, others (NV) intrinsically

        > exclude them. Simple LFA needs a supervisor to quickly abandon all

        > hope when they occur.

        #Ahmed

        As specified in the 3rd paragraph of the introduction the scope of the 
document is limited to single link, single node, and single local SRLG failure.

SB> Given that there *will* be multiple failures from time to time, thetext needs to provide

advice the matter.

>

        > In an SR network the paths used are not the shortest paths, they are a

        > collection of shortest paths, so there needs to be some discussion on

        > the interaction between the SR paths and repair paths to consider

        > whether it is unconditionally safe against forwarding loops.. It would

        > presumably be so if the authors borrowed the concept of repair

        > addresses rather than normal forwarding addresses from not-via, but I

        > don't think they have done this.

        #Ahmed

        Again the second statement of the 1st paragraph of the Introduction says

           By relying on segment routing this document provides

                  a local repair mechanism for standard IGP shortest path

        So the scope of the document is quite clear

SB> Reflecting on this, I cannot think of a case where you wouldgenerate a forwarding loop, providedyou ensure that no path between SR nodes crosses the failure, includingECMP paths.

SB> Please can you confirm that ECMP through the failure is prohibitedby the algorithm?

>

        > There should also be some discussion on the original path constraints

        > that are applicable to the repair. Presumably the ingress node

        > constrained the traffic to go though failed node F for a reason. If

        > the repair is unconstrained that reason could be violated, but this is

        > not discussed in the text.

        #Ahmed

        Same response as the response to the previous comment. The scope is 
standard IGP shortest paths

SB> So this can never be used to repair an SR path?

I think you either need to state that as a constraint, or specify howyou fulfil the requirements of thenode creating the original SR path, or you need to state that whenTI-LFA is being used to repair an SR

path the path constraints are in not in general respected.

>>

        > In the Security section you say:

>

        >     The behavior described in this document is internal functionality

        >     to a router that result in the ability to guarantee an upper bound

        >     on the time taken to restore traffic flow upon the failure of a

        >     directly connected link or node. As such no additional security

        >     risk is introduced by using the mechanisms proposed in this

        >     document.

>>

        > SB> I am not sure that the above is correct. There may be a security

        > reason

        > SB> why a packet was steered along a path which breaks when you use

        > this

        > SB> technique.

        #Ahmed

        The security consideration section has been modified to to indicate that

        the traffic is being steered over the post convergence path and hence 
there

        is no security risk because this is the path that the operator intended 
to use

        after the failure through the metrics configured on the links.

SB> I think we have a counter example.

        In fact by expediting

        rerouting the traffic over the intended post convergence path without 
waiting

        for IGP reconvergence, we have introduced a minor security enhancement 
by reducing

        misforwarding and/or traffic drop

>

        > In the conclusion you say:

>

        >     The

        >     mechanism is able to calculate the backup path irrespective of the

        >     topology as long as the topology is sufficiently redundant.

>>

        > SB> That is certainly true in classic. I am not sure this is

        > universally

        > SB> true under SR which includes the use of non-shortest path and

        > SB> binding segments.

        #Ahmed

        Again the document is restricted to IGP shortest path as mentioned in 
the introduction

SB> The security section needs to state that if this is used to repairan SR path additional

security considerations apply.

It sounds like it needs to state that in the event of multiple failuresthe result is

undetermined.

        > Minor issues:

>

        >     For each destination in the network, TI-LFA prepares a data-plane

        >     switch-over to be activated upon detection of the failure of a

        >     link used to reach the destination.

>

        > SB> To make the scaling clearer to the reader, I think you need

        > SB> to make it clear that for each protected link, you determine

        > SB> the repair needed to reach every destination reachable over that

        > SB> link. You sort of say that, but it's a bit hidden.

        #Ahmed

        I do not understand the difference between the text in the draft and the

        text that you are proposing. We think that our text is quite clear

        >     We provide the TI-LFA approach that achieves guaranteed coverage

        >     against link, node, and local SRLG failure, in any IGP network,

        >     relying on the flexibility of SR.

>

        > SB> Should that be any SINGLE link.... failure?

        #Ahmed

        As mentioned above few times above, the introduction clearly mentions 
*single*

SB> Yes, but the point is so important that you cannot over emphasise it.

        > In the text (and the text that follows)

>

        >     To do so, S applies a "NEXT" operation on Adj(S-F) and then two

        >     consecutive "PUSH" operations: first it pushes a node segment for

        > F,

        >     and then it pushes a protection list allowing to reach F while

        >     bypassing S-F.

>

        > You need to reference the SR operations.

        #Ahmed

        This paragraph is in Section 5.2.1. The latest version refers to the SR 
draft


SB> I think the point remains.

>

        > Also you are considering Adj segments, and presumably they were there

        > for a reason, but you do not discuss that.

        #Ahmed

        Section 5.2 discusses protecting adjacency segments

>

        > In 5.3.1 and 5.3.2 you have a list of conditions, but do not make it

        > clear whether any or all must be true.

>

        #Ahmed

        The intention is for all of the conditions to be true. I will make it 
clear in the next version

SB> Thank you.

        > Nits

>

        > 1. Introduction

>

        >     Segment Routing aims at supporting services with tight SLA

        >     guarantees [1]. This document provides a local repair mechanism

        >     relying on SR-capable of restoring end-to-end connectivity in the

        >     case of a sudden failure of a network component.

>

        > SB> Grammar needs a little work in the last sentence.

        #Ahmed

        Addressed in the latest version of the document

SB> Thank you.

        > In Fig 1, I assume that the blobs are network fragments.

>

        > In the conclusion you say:

        >     This document proposes a mechanism that is able to pre-calculate a

        >     backup path for every primary path so as to be able to protect

        >     against the failure of a directly connected link or node.

        > SB> you need to add SRLG

        #Ahmed

        Addressed in the latest version of the draft

SB> Thanks you.

- Stewart

_______________________________________________
rtgwg mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/rtgwg

Re: Request for RTGWG Working Group adoption for draft-bashandy-rtgwg-segment-routing-ti-lfa

Reply via email to