Re: [Lsr] LSR Flooding Reduction Drafts - Moving Forward

Tony Przygienda Fri, 24 Aug 2018 14:26:40 -0700

On Fri, Aug 24, 2018 at 1:36 PM <[email protected]> wrote:

>
> Tony,
>
> as to miscabling: yepp, the protocol has either to prevent adjacencies
> coming up or one has to deal with generic topologies. If you don't want to
> design miscabling prevention/topology ZTP into protocol (like ROLL or RIFT
> did) you have to deal with generic graph as you say. I think that if you
> have 99.99% of the time a predictable graph it's easier to restrict wrong
> cabling than deal with arbitrary topology when tackling this problem but
> that's my read.
>
>
>
> Well, you can take that approach, but you are at a risk of ignoring the
> one link that you needed to make the entire topology work better.  ;-)
>


yepp, no free lunch ... IME it's even more sublte. The more regular your
topology the more one can summarize & contain blast radius, the less, the
more topology information needs flooded to avoid blackholes, bow-ties to
the point where a generic topology (assuming prefix mobility) on the fabric
leaves one with host routes & scale limitation ...


> And if you don’t like the extra link problem, there’s also the missing
> link: what do you do when you have a leaf-spine topology, but there’s one
> link missing?  It’s not like you can ignore the missing link and flood on
> it anyway.  ;-)
>
>
that works just fine on -02 RIFT ...


>
> Another observation though would be that if you have a single mesh then
> centralized controller delay on failure becomes your delay bound how long
> flooding is disrupted possibly (unless your single covering graph has
> enough redundancy to deal with single llink failure, but then you're really
> having two as I suggest ;-). That could get ugly since you'll need
> make-before-break if installing a new mesh from controller me thinks with a
> round-trip from possibly a lot of nodes …
>
>
>
> Perhaps you didn’t understand the draft in detail.
>


>
> Even the loss of the area leader does not disrupt flooding.
>
> The flooding topology is in effect until a new area leader is elected and
> a new topology is distributed.  Yes, there is a hole in the flooding
> topology and you’re no longer bi-connected, but as long as it was still a
> single failure, you should still have a functioning flooding topology.  And
> because of that, it’s reasonable to assume that the area members can elect
> a new leader and switch to the new topology in an orderly fashion.
>
> It’s very true that there is a period where things are not bi-connected
> and a second failure will cause a flooding problem.  That period should be
> on the order of one failure detection, one flooding propagation, and one
> SPF computation.  If we expect failures to happen more frequently than
> that, then we need to call that out in our requirements. That type of
> scenario is perhaps reasonable under a MANET environment, but does not
> match any of my experience with typical data centers.
>
>
 Yes, I read the early versions, maybe missed some newer stuff, will read
newest and ponder more ... Though I think we're in sync ...



>
> >  iii) change in one of the vertex lifts
>>
>>
>> Sorry, I don’t understand point iii).
>>
>
> A mildly stuffed (or mathematically concise ;-) way to say that if you
> have one or two covering graphs (and vertex lift is the more precise word
> here since "covering graph" can be also an edge lift which is irrelevant
> here) and one of those subgraphs gets recomputed & distributed (due to
> failures, changes in some metrics, _whatever_) then this should not lead to
> disruption. Basically make-before-break as one possible design point,
> harder to achieve of course in distributed fashion …
>
>
>
> I think it would help the discussion if we phrased it less concisely. :-)
>

You know Aesop's fable on father, son & the donkey ? '-) I am normally
accused of excessive verbiage ;-) so that's somewhat of a compliment ;-)
But will take more care before introducing terms in the future.


>
>
> > moreover, I observe that IME ISIS is much more robust under such
>> optimizations since the CSNPs catch (@ a somehow ugly delay cost) any
>> corner cases whereas OSPF after IDBE will happily stay out of sync forever
>> if flooding skips something (that may actually become a reason to introduce
>> periodic stuff on OSPF to do CSNP equivalent albeit it won't be trivial in
>> backwards compatible way on my first thought, I was thinking a bit about
>> cut/snapshot consistency check [in practical terms OSPF hellos could carry
>> a checksum on the DB headers] but we never have that luxury on a link-state
>> routing protocol [i.e. we always operate under unbounded epsilon
>> consistency ;-) and in case of BGP stable oscialltions BTW not even that
>> ;^} ]).
>>
>> Emacs
>>
>>
> And that I cannot parse. Emacs? You want LISP code? But then, Dino may get
> offended ;-)
>
>
>
> Your comment seemed like just a poke in the perennial IS-IS vs. OSPF
> debate, which seems about as constructive as the Vi vs. Emacs debate.
>
>
ah, got you. Was not my intention, I dislike and like both protocol
equivalently, each has its warts [though if you look @ RIFT it's shameless
ISIS flooding rip-off ;-), that part IMO is somewhat better in ISIS] ;-)
But I thought I bring this angle into the thread since direct experience in
dealing with the problem of flood reduction taught me the ISIS
belts-suspender flooding can really save the bacon in such
algorithms/improvements. Unless we assume the algorithm will be perfect and
never loose anything, like never ever ... ;-)

e'one a good weekend ...

--- tony

_______________________________________________
Lsr mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/lsr

Re: [Lsr] LSR Flooding Reduction Drafts - Moving Forward

Reply via email to