Hi Peter, > On Jul 16, 2024, at 3:59 AM, Peter Psenak <[email protected]> wrote: > > Hi Acee, > > On 15/07/2024 21:08, Acee Lindem wrote: >> Hi Peter, >> >>> On Jul 15, 2024, at 13:04, Peter Psenak <[email protected]> wrote: >>> >>> Hi Acee, >>> >>> On 12/07/2024 22:00, Acee Lindem wrote: >>>> Les - >>>> >>>> The SA bit solution is no more “complete" than the database exchange >>>> solution. Let’s talk specific scenarios rather than FUD. >>>> >>>> So we have an LSA originated by the restarting router at time T0 and one >>>> originated by its neighbor at time T1 where T1 is after T0. Although, they >>>> take the same flooding path, the one originated at T1 arrives and is >>>> processed ahead of the one originated at T0 resulting in traffic loss. >>>> >>>> I’m not arguing that this hypothetical situation isn’t possible with >>>> packet loss. However, other than the added overhead and inefficiency of >>>> the SA bit signaling resulting in some small delay, how does the SA bit >>>> solution solve this? How does the restarting router know when its updated >>>> LSAs have successfully been installed on all the routers in the area? It >>>> certainly doesn’t know any better than its neighbor. >>>> >>>> Thanks, >>>> Acee >>>> P.S. One could add a small delay to the database exchange solution once >>>> the last stale LSA is updated or purged but I don’t believe this is >>>> necessary. >>> I believe adding some text about delaying the origination of the Router LSA >>> on the restarting router, while letting the forwarding plane be updated on >>> it by running the SPF and using its local adjacencies during that SPF, >>> would be a good addition to the draft. >>> As others alluded to, the problem of traffic hitting the restarting router >>> before it had a chance to fully converge its forwarding plane has been >>> observed in the field numerous time. >> >> The data plane update delay is a local problem whereas the usage of stale >> LSAs due to OSPF converging before these LSAs are updated is a OSPF protocol >> problem (but only a transient problem). If a platform has a data plane >> update delay, it can locally delay advertisement for all routing protocols >> (not just OSPF). However, this seems to be a platform issue rather than a >> protocol issue which needs to be specified in an IETF document. However, I >> don’t think that the local data plane delay should require collaboration >> with other routers when it can be handled locally. > > it can be addressed locally. All I'm asking for is to put some text about the > issue and possible solution.
I can add this since the issue has the same symptom as the stale LSA usage. Thanks, Acee > thanks, > Peter > > > >> >> The delay I was saying I didn’t think should be added above is the one for >> stale LSA usage due to some LS updates being lost and later ones being >> flooded successfully. I feel this is the only way stale LSAs wouldn’t be >> updated or purged by the time the router-LSA with the adjacency to the >> restarting router is received (if the procedures in the draft are followed). >> >> Thanks, >> Acee >> >> >> >> >>> thanks, >>> Peter >>> >>>> >>>> >>>> >>>>> On Jul 12, 2024, at 14:48, Les Ginsberg (ginsberg) <[email protected]> >>>>> wrote: >>>>> >>>>> Tony – >>>>> What is important to me here is a common understanding and providing a >>>>> complete solution. >>>>> Hopefully, you are at least understanding that the point I am making is >>>>> valid i.e., traffic loss can occur even with better-idbx in place. >>>>> I would also argue that you are underestimating the effect of scale. >>>>> As to your argument below, it could also be used to argue against >>>>> doing anything – after all we know that current OSPF does converge in a >>>>> modest amount of time. >>>>> Since you have decided to make things better (which I support) I do not >>>>> see why we should not define a complete solution. >>>>> If you, as a vendor, choose not to implement SA because you consider the >>>>> cost/benefit ratio unappealing – that is your choice. So long as you and >>>>> your customers are satisfied … >>>>> But our mission here is to define a solution – and I am simply arguing >>>>> for a more complete solution. >>>>> Les >>>>> From: Tony Przygienda <[email protected]> >>>>> Sent: Friday, July 12, 2024 11:23 AM >>>>> To: Acee Lindem <[email protected]> >>>>> Cc: Les Ginsberg (ginsberg) <[email protected]>; Liyan >>>>> Gong<[email protected]>; Aijun Wang <[email protected]>; >>>>> Peter Psenak (ppsenak) <[email protected]>; Yingzhen Qu >>>>> <[email protected]>; lsr<[email protected]>; lsr-chairs >>>>> <[email protected]>; shraddha <[email protected]> >>>>> Subject: Re: [Lsr] About Premature aging of LSA and Purge LSA >>>>> Les, whatever you try to suggest here, you slide into direction of >>>>> trying to guarantee common knowledge closure (that's the technical term >>>>> for what you try) and based on distributed systems theory you end up >>>>> ultimately with virtual clock synchronization of the network in some form >>>>> if you _really_ want to solve the problem rather than "hey, my stuff may >>>>> work 2 hops away rather than 1 hop so it's much better and let's not talk >>>>> about 3 hops" (look up at Lampert's clock vectors/matrices for proper >>>>> theoretical underpinnings of such undertakings if you'd like to take this >>>>> discussion further) and this will slow things to a crawl. Worse, you will >>>>> discover pretty soon that going down this path you will have to learn >>>>> consistent cuts and basically transaction scheduling most likley ;-) >>>>> IGPs are just IGPs, i.e. they do guarantee "eventual consistency" (in >>>>> proper technical terms epsilon consistency) and that makes them fast and >>>>> reacting fast to failures and that's the base of their success. This also >>>>> means you have transients and this here is just one, relatively simple >>>>> fix of a local transient and that's about the best you can do to preserve >>>>> the desirable properties (i.e. fastest possible eventual consistency with >>>>> maximum resiliency [that's the CAP paradigm part which is another way to >>>>> see IGPs as _AP type of solution]). Without this kind of underlying >>>>> understanding/language we are talking about "me likes my stuff with me >>>>> bells and whistles better than ye' thing" and it's going in circles >>>>> AFAIS. so I'm with acee here in short (and I left the fact out that as >>>>> I say, flavor of this stuff is deployed since long time and works fine at >>>>> any scale in our experience and it's damn' simple to implement >>>>> comparatively speaking and doesn't need any big rollouts on the network] >>>>> compared to all the signalling machinery suggested) -- tony On Fri, >>>>> Jul 12, 2024 at 6:44 PM Acee Lindem <[email protected]> wrote: >>>>> So, I don’t think the case you are suggesting is plausible. Let’s say you >>>>> have a hypothetical router somewhere in the same area that has the >>>>> restarting router’s stale LSAs. >>>>> 1. The restarting router’s neighbors will only advertise an >>>>> adjacency once the stale LSAs have been updated or purged from their >>>>> local databases. >>>>> 2. Only then will the adjacency be advertised - so the update or >>>>> purge precedes the adjacency advertisement. >>>>> 3. How is the neighbor router’s LSA going to pass the restarting >>>>> router’s LSA update or purge? It will take the same or possibly even >>>>> better flooding path. Will it be flooded at warp speed? >>>>> 4. Are you suggesting that the restarting router’s LSAs are dropped >>>>> but the neighbor’s advertisement is not? If so, how would the restarting >>>>> router know this and delay removing the adjacency suppression? Are you >>>>> relying on the inherent inefficiencies and convergence delays with LLS >>>>> signaling handshake between the two routers? >>>>> In any case, trying to prevent transient problems due to selective loss >>>>> of updates is an exercise in futility. >>>>> Thanks, >>>>> Acee >>>>> >>>>> >>>>> On Jul 12, 2024, at 12:13, Les Ginsberg (ginsberg) <[email protected]> >>>>> wrote: >>>>> Acee – >>>>> When the restarting router goes down, the state of the LSDB in the >>>>> network becomes: >>>>> Restart Router LSA: All neighbors advertised >>>>> Neighbor Routers: Neighbor to Restarting Router is removed >>>>> When the restarting router comes back up, two changes will occur: >>>>> 1)Restarting Router updates its LSAs >>>>> 2)Neighbors updates their LSAs to indicate it once again has a neighbor >>>>> to the restarting router >>>>> You cannot guarantee the flooding order of network-wide. >>>>> Because the stale LSAs from the Restarting Router are present in all >>>>> nodes, as soon as a neighbor readvertises the adjacency to the restarting >>>>> router, it is now possible that on some nodes in the network you will >>>>> temporarily have an LSDB which has: >>>>> Stale LSA from restarting router + Updated LSA from neighbor >>>>> Whether the restarting router sends an updated LSA with neighbors or >>>>> without neighbors (as you suggest) you cannot prevent the above transient >>>>> condition from occurring because doing so requires guaranteeing that the >>>>> update to the Neighbor LSA and the update to the restarting router LSA >>>>> are done atomically network-wide. >>>>> That is why the restarting router cannot do this without help from the >>>>> neighbors. >>>>> Hope this is clear. >>>>> Les >>>>> From: Acee Lindem <[email protected]> >>>>> Sent: Friday, July 12, 2024 7:55 AM >>>>> To: Les Ginsberg (ginsberg) <[email protected]> >>>>> Cc: Liyan Gong <[email protected]>; Aijun Wang >>>>> <[email protected]>; Peter Psenak (ppsenak) <[email protected]>; >>>>> Yingzhen Qu <[email protected]>; lsr <[email protected]>; lsr-chairs >>>>> <[email protected]>; tony Przygienda <[email protected]>; shraddha >>>>> <[email protected]> >>>>> Subject: Re: [Lsr] About Premature aging of LSA and Purge LSA >>>>> On Jul 12, 2024, at 10:49, Les Ginsberg (ginsberg) >>>>> <[email protected]> wrote: >>>>> Acee – >>>>> The neighbors do not control when the flooding of the purge/update >>>>> reaches all routers in the network. >>>>> The neighbors have direct control of the exchange between themselves and >>>>> their immediate neighbors – nothing else. >>>>> The restarting router has no better idea. If you’re suggesting >>>>> suppressing advertising adjacencies until all neighbors of the restarting >>>>> router are adjacent (which is a bad idea), the restarting router can do >>>>> this as well by suppressing its link advertisements. There is NOTHING >>>>> additional that can be accomplished by adding LLS signaling. >>>>> Acee >>>>> Les >>>>> From: Acee Lindem <[email protected]> >>>>> Sent: Friday, July 12, 2024 7:44 AM >>>>> To: Les Ginsberg (ginsberg) <[email protected]> >>>>> Cc: Liyan Gong <[email protected]>; Aijun Wang >>>>> <[email protected]>; Peter Psenak (ppsenak) <[email protected]>; >>>>> Yingzhen Qu <[email protected]>; lsr <[email protected]>; lsr-chairs >>>>> <[email protected]>; tony Przygienda <[email protected]>; shraddha >>>>> <[email protected]> >>>>> Subject: Re: [Lsr] About Premature aging of LSA and Purge LSA >>>>> >>>>> >>>>> On Jul 12, 2024, at 10:40, Les Ginsberg (ginsberg) <[email protected]> >>>>> wrote: >>>>> Acee – >>>>> Having the restarting router suppress advertisement of its adjacencies >>>>> does not address the transient state where routers in the network have >>>>> received the updated LSA from the neighbor with the reestablished >>>>> adjacency to the restarting router but still have the stale LSA from the >>>>> restarting router that has the pre-restart adjacency advertisements. >>>>> (point #1 I made below). >>>>> The neighbors of the restarting router will not advertise the adjacency >>>>> until the stale LSAs are purged or updated - this is the whole point of >>>>> https://datatracker.ietf.org/doc/draft-hegde-lsr-ospf-better-idbx/ >>>>> Thanks, >>>>> Acee >>>>> >>>>> >>>>> So this is not a robust solution. >>>>> Les >>>>> From: Acee Lindem <[email protected]> >>>>> Sent: Friday, July 12, 2024 7:21 AM >>>>> To: Les Ginsberg (ginsberg) <[email protected]> >>>>> Cc: Liyan Gong <[email protected]>; Aijun Wang >>>>> <[email protected]>; Peter Psenak (ppsenak) <[email protected]>; >>>>> Yingzhen Qu <[email protected]>; lsr <[email protected]>; lsr-chairs >>>>> <[email protected]>; tony Przygienda <[email protected]>; shraddha >>>>> <[email protected]> >>>>> Subject: Re: [Lsr] About Premature aging of LSA and Purge LSA >>>>> Hi Les, >>>>> >>>>> >>>>> >>>>> On Jul 12, 2024, at 02:57, Les Ginsberg (ginsberg) <[email protected]> >>>>> wrote: >>>>> I am happy that work on this problem has begun. >>>>> I believe the most robust way forward is to implement the mechanisms >>>>> defined in BOTH drafts. >>>>> I think the mechanism defined in draft-hegde-lsr-ospf-better-idbx is >>>>> sound and not overly complex (sorry Liyan 😊) and should be done. >>>>> But it does not solve all aspects of the problem. >>>>> It does make LSDB synchronization more robust – which addresses the >>>>> control plane aspects of the problem. >>>>> It also has the advantage that it does not require any support on the >>>>> neighboring routers – and so the benefits can be realized simply by >>>>> upgrading one router at a time. >>>>> However, draft-hegde-lsr-ospf-better-idbx does not address forwarding >>>>> plane aspects of the problem – which become more significant at scale. >>>>> There are two aspects of this problem: >>>>> 1)You do not have control over the order in which the updated LSAs are >>>>> flooded to the rest of the network – so it is still possible for >>>>> transient forwarding issues to occur multiple hops away from the >>>>> restarting router. >>>>> 2)The restarting router requires additional time – after full LSDB sync – >>>>> to program the forwarding plane. It is well known that update of the >>>>> forwarding plane takes much longer than protocol SPF calculation. >>>>> If only a few hundred routes are supported, this may not be of >>>>> significant concern, but if thousands of routes are supported the time it >>>>> takes to program the forwarding plane becomes a significant contributor. >>>>> I fail to see how suppressing neighbor adjacency advertisement solves >>>>> any additional problems that are not solved by avoiding usage of the >>>>> restarting router’s stale LSAs. >>>>> Note that the OSPF SPF has a check for bi-directional connectivity, >>>>> excerpted from section 16.1 of RFC2328: >>>>> (b) Otherwise, W is a transit vertex (router or transit >>>>> network). Look up the vertex W's LSA (router-LSA or >>>>> network-LSA) in Area A's link state database. If the >>>>> LSA does not exist, or its LS age is equal to MaxAge, or >>>>> it does not have a link back to vertex V, examine the >>>>> next link in V's LSA.[23] >>>>> Consequently, the restarting router can simply suppress its own >>>>> link advertisement until such time that is required to solve the above >>>>> problems. You should be familiar with this quote: >>>>> “If you want a thing done well, do it yourself.” >>>>> ― Napoleon Bonaparte >>>>> Thanks, >>>>> Acee >>>>> >>>>> >>>>> >>>>> draft-cheng-lsr-ospf-adjacency-suppress provides a way to address the >>>>> above two aspects by providing a means for the neighbors of the >>>>> restarting router to delay advertisement of the restored adjacency to the >>>>> restarting router. (SA signaling) >>>>> It could be argued that using SA signaling eliminates the need to do >>>>> anything else – but given that this mechanism depends upon support by all >>>>> the neighbors of the restarting router I believe there is still good >>>>> reason to implement both mechanisms. >>>>> NOTE: I would prefer that the two drafts be combined into a single >>>>> draft – but that is optional and up to the authors. But from the WG >>>>> perspective I would like to see both solutions progress. >>>>> Les >>>>> From: Liyan Gong <[email protected]> >>>>> Sent: Thursday, July 11, 2024 8:22 PM >>>>> To: Acee Lindem <[email protected]>; Aijun Wang >>>>> <[email protected]> >>>>> Cc: Peter Psenak (ppsenak) <[email protected]>; Yingzhen Qu >>>>> <[email protected]>; lsr <[email protected]>; lsr-chairs >>>>> <[email protected]>; tony Przygienda <[email protected]>; shraddha >>>>> <[email protected]> >>>>> Subject: [Lsr] Re: About Premature aging of LSA and Purge LSA >>>>> Hi Acee and Aijun, >>>>> Thank you very much for your discussion. I would like to share my >>>>> thoughts on the proposed solutions. >>>>> In my view, draft-hegde-lsr-ospf-better-idbx may not be as straight >>>>> forward as it initially appears. >>>>> Despite its local applicability, it entails a complex neighbor >>>>> establishment process, which is fundamental to the OSPF protocol and >>>>> typically not altered lightly by those familiar with its workings. >>>>> On the other hand, draft-cheng-lsr-ospf-adjacency-suppress presents a >>>>> more focused approach tailored to address the specific issue without >>>>> unintended consequences. >>>>> I still believe the key factor in evaluating any approach is whether it >>>>> impacts the current systems negatively. >>>>> Regarding our extensive discussions on these drafts, please refer to >>>>> our previous records for more details. >>>>> https://mailarchive.ietf.org/arch/search/?q=%22draft-cheng-lsr-ospf-adjacency-suppress%22 >>>>> Thank you for your attention to this matter. >>>>> Best Regards, >>>>> Liyan >>>>> ----邮件原文---- >>>>> 发件人:Acee Lindem <[email protected]> >>>>> 收件人:Aijun Wang <[email protected]> >>>>> 抄 送:Peter Psenak <[email protected]>,Yingzhen Qu >>>>> <[email protected]>,lsr <[email protected]>,lsr-chairs >>>>> <[email protected]>,tony Przygienda <[email protected]>,shraddha >>>>> <[email protected]> >>>>> 发送时间:2024-07-11 23:26:57 >>>>> 主题:[Lsr] Re: About Premature aging of LSA and Purge LSA >>>>> >>>>> As WG member: >>>>> On Jul 11, 2024, at 05:29, Aijun Wang <[email protected]> wrote: >>>>> And, there is also another draft aims to solve the similar problem >>>>> https://datatracker.ietf.org/doc/html/draft-cheng-lsr-ospf-adjacency-suppress-02, >>>>> which it declares similar with the solution in IS-IS. Why not take >>>>> this approach? >>>>> Because this one doesn’t require any signaling and can accomplished via >>>>> local behavior without requiring support from any other OSPF router. >>>>> Additionally, it is simpler.. Well, at least for someone who has a deep >>>>> understanding of the protocol. >>>>> Thanks, >>>>> Acee >>>>> >>>>> >>>>> >>>>> >>>>> Best Regards >>>>> Aijun Wang >>>>> China Telecom >>>>> 发件人: [email protected] [mailto:[email protected]] >>>>> 代表 Aijun Wang >>>>> 发送时间: 2024年7月11日 17:20 >>>>> 收件人: 'Acee Lindem' <[email protected]> >>>>> 抄送: 'Peter Psenak' <[email protected]>; 'Yingzhen Qu' >>>>> <[email protected]>; 'lsr' <[email protected]>; 'lsr-chairs' >>>>> <[email protected]>; 'tony Przygienda' <[email protected]>; >>>>> 'shraddha' <[email protected]> >>>>> 主题: [Lsr] 答复: Re: About Premature aging of LSA and Purge LSA >>>>> For the neighbors of the restarting router, why can’t they delete >>>>> directly the LSAs that originated by the restarting router instead of >>>>> putting them into one “Stale DB Exchange list” when they detect their >>>>> neighbor is down? >>>>> 发件人: [email protected] [mailto:[email protected]] >>>>> 代表 Acee Lindem >>>>> 发送时间: 2024年7月10日 22:14 >>>>> 收件人: Aijun Wang <[email protected]> >>>>> 抄送: Peter Psenak <[email protected]>; Yingzhen Qu >>>>> <[email protected]>; lsr <[email protected]>; lsr-chairs >>>>> <[email protected]>; tony Przygienda <[email protected]>; shraddha >>>>> <[email protected]> >>>>> 主题: [Lsr] Re: About Premature aging of LSA and Purge LSA >>>>> Yes - but the whole discussion of adjacency suppression and database >>>>> synchronization is based on preventing TEMPORARY usage of stale LSAs >>>>> leading to false bidirectional adjacencies during unplanned restart. RFC >>>>> 2328 OSPF will converge without any modifications - there can just be >>>>> transient traffic drops and/or loops. >>>>> Thanks, >>>>> Acee >>>>> On Jul 9, 2024, at 20:42, Aijun Wang <[email protected]> wrote: >>>>> For the unplanned restart, shouldn’t the responsibility of the directed >>>>> connect neighbors to send out such LSAs for the purge of obsolete LSA? >>>>> Best Regards >>>>> Aijun Wang >>>>> China Telecom >>>>> 发件人: [email protected] [mailto:[email protected]] >>>>> 代表 Acee Lindem >>>>> 发送时间: 2024年7月9日 20:14 >>>>> 收件人: Peter Psenak <[email protected]> >>>>> 抄送: Aijun Wang <[email protected]>; Yingzhen Qu >>>>> <[email protected]>; lsr <[email protected]>; lsr-chairs >>>>> <[email protected]>; tony Przygienda <[email protected]>; shraddha >>>>> <[email protected]> >>>>> 主题: [Lsr] Re: About Premature aging of LSA and Purge LSA >>>>> Additionally, you certainly don’t need a standards track solution to >>>>> this problem. An implementation could honor MinLSInterval by simply >>>>> locally keeping its own list of self-originated MaxAge LSAs and delaying >>>>> reorigination. >>>>> Thanks, >>>>> Acee >>>>> On Jul 9, 2024, at 04:13, Peter Psenak <[email protected]> wrote: >>>>> Aijun, >>>>> On 09/07/2024 09:46, Aijun Wang wrote: >>>>> Hi, Acee: >>>>> Can the proposal in >>>>> https://datatracker.ietf.org/doc/html/draft-dong-ospf-purge-lsa-00, >>>>> together with >>>>> https://datatracker.ietf.org/doc/html/rfc2328#section-14.1(Premature >>>>> aging of LSAs) solve your mentioned problem? >>>>> If so, is it simpler than your proposal? >>>>> That is, before the router restart, it needs only send out the Purge >>>>> LSA(when LSA sequence number is not to wrap) or premature aging of its >>>>> LSA.(when sequence number is to wrap) >>>>> does not work for unplanned restart. >>>>> thanks, >>>>> Peter >>>>> Best Regards >>>>> Aijun Wang >>>>> China Telecom >>>>> 发件人: [email protected] [mailto:[email protected]] >>>>> 代表 Acee Lindem >>>>> 发送时间: 2024年7月9日 3:58 >>>>> 收件人: Yingzhen Qu <[email protected]> >>>>> 抄送: lsr <[email protected]>; lsr-chairs <[email protected]>; tony Przygienda >>>>> <[email protected]>; shraddha <[email protected]> >>>>> 主题: [Lsr] Re: IETF 120 LSR Slot Requests >>>>> Speaking as WG member: >>>>> I would like a 10 minute slot to present an update to >>>>> https://datatracker.ietf.org/doc/draft-hegde-lsr-ospf-better-idbx/ >>>>> Thanks, >>>>> Acee >>>>> >>>>> >>>>> >>>>> >>>>> On Jun 25, 2024, at 14:19, Yingzhen Qu <[email protected]> wrote: >>>>> Hi, >>>>> >>>>> The draft agenda for IETF 120 has been posted: >>>>> IETF 120 Meeting Agenda >>>>> The LSR session is scheduled on Friday Session I1 9:30 - 11:30, July >>>>> 26, 2024. >>>>> Please send slot requests to [email protected] before the end of the >>>>> day Wednesday July 10th. Please include draft name and link, presenter, >>>>> desired slot length including Q&A. >>>>> Please note that having a discussion on the LSR mailing list is a >>>>> prerequisite for a draft presentation in the WG session. If you need any >>>>> help please reach out to the chairs. >>>>> Thanks, >>>>> Yingzhen >>>>> >>>> >>> >>> >> > _______________________________________________ Lsr mailing list -- [email protected] To unsubscribe send an email to [email protected]
