Re: [Lsr] 【Responses for Comments on PUAM Draft】RE: IETF 112 LSR Meeting Minutes

Aijun Wang Thu, 18 Nov 2021 15:22:34 -0800

Hi, Tony:

Aijun Wang
China Telecom


> On Nov 19, 2021, at 00:46, Tony Przygienda <[email protected]> wrote:
> 
> 
> Agreeing with T. Li here (i.e. BFD next-hops) and let me add that AFAIS the 
> confusion here is that a presence of a /32 route is used as SSAP liveliness 
> AFAIS and that's simply not what IGPs are here for if you consider their main 
> job to be fastest possible convergence in network _reachability_ only and not 
> signalling of service failures.

[WAJ] The problem is arose from the summary action of IGP, why let other 
protocols solve it? 

> BGP is the overlay synchronizing SSAPs & scales marvelously @ that. Having 
> BGP next-hop (which is basically equivalent to all services provided behind 
> it) liveliness indicating the health of services behind it is the scalable 
> solution IMO,

[WAJ] Have discussed the BFD solutions several rounds on the list. Just for 
remind: BFD has the configuration overhead; and if PEs peer via the RR, BFD for 
BGP can’t solve.

> and not starting to try to teach IGP fragile signalling or PUAM (which BTW 
> AFAIS will neither scale nor work on generic graphs due to lack of any 
> consistent algebra I could detect in the draft and it is definitely nothing 
> "like rift" as the preso seems to claim again) which will easily affect its 
> main job.
[WAJ] The PUAM solution meet any graph, not like the RIFT is suitable only some 
planned only topology. If you don’t agree, please give the example before 
making some unconvincing assertions.
The reason that I mentioned RIFT is that you are familiar with it, just want 
you to understand the PUAM. 
Will the aggregate router advertise the detail reachability information to its 
leaves when the other aggregate router at the same sites can’t be reachable in 
RIFT?
Even with the above similarities, we will try to avoid to mention RIFT later, 
because not all readers familiar with it. 

> For signalling I see how putting it into a service instance is a somewhat 
> palatable design choice and it's kind of like inventing "passive BFD" over 
> flooding in my eyes ;-)
[WAJ] Using service instance is rejected already. I think you have noticed Acee 
also mentioned this.

> And BTW, in topologically sorted graphs (CLOS being the ones of interest 
> these days) with strict positive/negative disaggregation algebra with minimal 
> blast radius on failures we can scale to (at least) 0.5M prefixes 
> implementation wise IME and that should allow us really, really big IP 
> fabrics with leaves holding nothing but defaults under normal conditions but 
> it's still not a good idea to abuse that for SSAP synchronization AFAIS (and 
> observe that to scale RIFT does NOT notify leaves of their vice-versa 
> reachability, it simply prevents blackholing on aggregates and will produce 
> an ICMP unreachable if there are no routes left to destination, if you run 
> BFD on top of that as Tony suggests, this will of course give you the desired 
> effect, for RR you'll run into the TCP session problem again but maybe you 
> can BFD the RR session and then propagate that as Robert seems to suggest, 
> the third-party next-hop raises its head again ;-). 
[WAJ] We are discussing the general solution, not the solution that specific 
only some limited topology.

> 
> Alternately resolving BGP over BGP as Robert suggests (if I read that 
> correctly) and use RR to scale out the SSAP nhop availability is possible I 
> think architecturally without garbage-canning IGPs as "network-wide fast 
> broadcast mechanism" ... I doubt it will do "couple millisecs" convergence 
> ;-) but can be simpler hardware wise than trying to scale up BFD to large 
> number of very fast sessions. 

[WAJ] The operator doesn’t also want the network is filled with various queer 
designs or solutions.

> 
> -- tony 
> 
> 
> 
>> On Thu, Nov 18, 2021 at 5:06 PM Tony Li <[email protected]> wrote:
>> 
>> Les,
>> 
>>> Why would we then punch holes in the summary for member routers?  Just 
>>> because we can?
>>> [LES:] No. We are doing it to improve convergence AND retain scalability.
>> 
>> 
>> You are not improving convergence. You are propagating liveness.  The fact 
>> that this relates to convergence in the overlay is irrelevant to the IGP.
>> 
>> You are not retaining scalability. You are damaging it. You are proposing 
>> flooding a prefix per router that fails. If there is a mass failure, that 
>> would result in flooding a large number of prefixes. The last thing you want 
>> when there is a mass failure is additional load, exacerbating the situation.
>> 
>> 
>>>  Should we corrupt the architecture just because we can?  There are other, 
>>> architecturally appropriate solutions available.  How about we just use 
>>> them?
>>>  
>>> [LES:] What are you proposing?
>> 
>> 
>> You are signaling the (lack of) liveness of a remote node. I propose that we 
>> instead use existing signaling mechanisms to do this. Multi-hop BFD seems 
>> like an obvious choice.
>> 
>> If you greatly dislike that for some reason, I would suggest that we create 
>> a proxy liveness service, advertised by the ABR. This would allow 
>> correspondents to register for notifications. The ABR could signal these 
>> unicast when it determines that the specific targets are unreachable.
>> 
>> Tony
>> 
>> 
> _______________________________________________
> Lsr mailing list
> [email protected]
> https://www.ietf.org/mailman/listinfo/lsr

_______________________________________________
Lsr mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/lsr

Re: [Lsr] 【Responses for Comments on PUAM Draft】RE: IETF 112 LSR Meeting Minutes

Reply via email to