On Mon, Dec 9, 2024 at 12:26 PM Henk Smit <[email protected]> wrote:
> A few points regarding flooding reduction algorithms. > > > Leaderless. > === > I prefer a distributed flooding algorithm over a centralized flooding > algorithm. > But I do not prefer the leaderless way. It will be hard already to make 2 > different > flooding algorithms in the same network cooperate. (legacy and a new one). > Having multiple algorithms, and not knowing what the other routers do, > will be even harder. > > Having a leader is very much like having routers advertise the FAD in > flex-algo/ > (This analogy comes from Peter). > It ensures everyone runs the same FA algorithm. And if a router doesn't > understand the FAD, > it falls back to default behaviour (and just doesn't participate in this > flex-algo). > Having something similar for the flooding algorithm too will make things > simpler. > And thus more robust. > Those are beliefs basically and because FAD was solved that way does not mean it's better in any way. First, synchronizing distributed computation where e'one must run same algorithm to populate same RIB is by nature scope of whole domain and hence e'one has to agree on the algorithm used (and agreeing in distributed fashion would either lead to the same solution as suggested here choping an area into "components" possibly or otherwise to Paxos problem so centralized-leader is in a sense skirting the problem by paying with blast radius. Blast radius may be simpler to understand and specify until you get caught by them, enough interesting incidents recently, although not necessarily the IGP related ones, made it into the press). And again, distributed computation over whole domain where e'one must agree on same algorithm is a different problem from reduction where each component can run a different algorithm as long its properly signalled and stitched by full flooding. In fact, the analogy would be multiple "computation domains" where each "domain" agrees on the algorithm, seamless MPLS does that e.g. since about forever stitching on the edges. And no'one here AFAIS guns for this as something that is operationally desirable except in case of algorithm upgrade/migration where this capability guarantees minimum blast radius/risk operational procedure. > > > Flooding topology repair. > === > The hard part of optimized flooding is not coming up with different > algorithms to prune links. > The hard part is to repair the flooding topology when it breaks. > If I understand correctly, the only real mechanism we have for that is > sending CSNPs on p2p links? > CSNPs are not reliable. So once you use them, you need to keep sending > periodic CSNPs forever. > Imho when you have to do that at scale, that solution is worse than the > original problem we were trying to solve. > Please (re-)-read the disttopo draft > > > Flooding reduction when an IS-IS process (re)starts. > === > Imho, the biggest scalability issue with flooding is when an IS-IS > process, in a large network, (re)starts. > That is when potentially all LSPs are sent over all adjacencies. > No other event will cause this much (local) flooding. > But unfortunately, when IS-IS starts, it has an empty LSPDB. And thus it > has no idea about the topology yet. > So it can not really participate in any flooding reduction algorithm > (yet). > Only when it has received all or most LSPs, it can participate in any > flooding reduction algorithm. > But by then, the need for it has already diminished. > during restart, the problem is incast and not flood reduction. For incast, please look at RIFT section 6.3.5 (RAIN), this is independent of any flood reduction. Otherwise other nodes have computation for flood reduction already cached in place (modulo the originator upcoming links, one possiblity here is to tighten the disttopo clause that says "do not flood back to the source") --- tony > > > So in any proposed flooding reduction algorithm, I would like to see these > two points discussed in detail. > 1) How to repair a broken flooding topology ? (Preferably without periodic > CSNPs). And > 2) How beneficial is the algorithm during process (re)start? > > > henk > > _______________________________________________ > Lsr mailing list -- [email protected] > To unsubscribe send an email to [email protected] >
_______________________________________________ Lsr mailing list -- [email protected] To unsubscribe send an email to [email protected]
