On Dec 4, 2024, at 8:36 AM, Les Ginsberg (ginsberg) - ginsberg at
cisco.com <[email protected]> wrote:
Tony –
Upgrades are orthogonal to my comments.
I am speaking about the need to deploy multiple flooding algorithms
in a network (one of which may be “static”).
We have never considered that in scope before – and there are obvious
challenges to doing so – not least of which is the ability to test.
I think when you say “upgrade” you are talking about needing to
migrate from algorithm X to algorithm Y – or from Algo X-V1 to Algo
X-V2 where V2 has some fix that isn’t fully interoperable with V1.
We already have a way handling this case:
Revert to base flooding everywhere – do the upgrade – and then enable
the upgraded algo.
Conceptually, this is consistent with how we have deployed major
infra upgrades (e.g., narrow to wide metrics).
This is far safer than trying to deal with co-existence – not least
because once you allow co-existence you have to allow that a customer
might use this as a permanent state – not just an upgrade state.
Given the challenges we already face with interoperability even when
all routers are trying to “do the same thing” (and I am not limiting
this comment to just flooding) the idea that we should now embrace
a persistent state where routers are intentionally doing inconsistent
things seems at best naïve.
Imagine that you and I are called to root cause problems in a
customer network.
Your implementation supports algorithm X and doesn’t understand
algorithm Y.
My implementation supports algorithm Y and doesn’t understand
algorithm X.
Flooding issues are notoriously difficult to diagnose – even when all
nodes are supposed to be doing the same thing.
All the while our mutual customer is (rightfully) pressuring to get
this fixed ASAP.
We might well ask “how did we get into this mess”.
Les
*From:*Tony Li <[email protected]> *On Behalf Of *Tony Li
*Sent:* Wednesday, December 4, 2024 7:54 AM
*To:* Les Ginsberg (ginsberg) <[email protected]>
*Cc:* Tony Przygienda <[email protected]>; Peter Psenak (ppsenak)
<[email protected]>; Shraddha Hegde <[email protected]>; Robert
Raszuk <[email protected]>; lsr <[email protected]>
*Subject:* Re: [Lsr] Another counter-example
Les,
The step that you’re missing is that upgrades are inevitable and thus
an operational necessity.
We are very, very, very unlikely to get things right on the first go.
Therefore, we will need to fix our bugs. How do you deploy that bug
fix? Add to the mix that we’re not willing to do a flag day cutover
to the fix.
A better way of thinking of mesh groups is that they are the ’static
routes’ of legacy flooding. They are installed by network operators
and are presumed to be perfect. No signaling necessary.
Tony
On Dec 4, 2024, at 7:28 AM, Les Ginsberg (ginsberg) - ginsberg at
cisco.com <[email protected]> wrote:
I am very much in agreement with Peter – though I think his
commentary is “too kind”. 😊
The issue w mesh groups is that they are opaque to other nodes
i.e., you may come up with a way of signaling that a node has
configured mesh groups (which BTW the distoptflood draft does NOT
currently have – and I hope it never does…) but unless you are
going to also propose that a node signal what links are/are not
being used for flooding the best you can do from the POV of other
nodes is treat the node as if it is running a flooding algorithm
which is totally opaque – and which is also “brittle” i.e., it
doesn’t do well in the event of topology changes.
To Tony P – one of the things that disturbs me about the way this
discussion is taking place is how we seem to have “skipped steps”.
The interest in optimized flooding dates back decades.
Early attempts include:
https://datatracker.ietf.org/doc/rfc2973/ (Mesh Groups) (circa 2000)
https://datatracker.ietf.org/doc/html/draft-ietf-ospf-isis-flood-opt-01
(circa 2001)
MANET work (circa 2014)
All of these attempts were very conservative in nature. The
notion of deploying multiple solutions simultaneously and
thinking about how they might “interoperate” was quite
deliberately not looked at. The general view has been “be very
very careful when you mess with flooding”.
Suddenly, we now seemed to “leaped off the cliff” and are talking
about deploying multiple algorithms and trying to get them to
“interoperate”.
At what point did the WG conclude that this is a real requirement
and that it actually can be deployed safely?
If people want to discuss this – the WG is a fine place to do it.
But I would appreciate discussion that does not skip over the
very real concerns that have kept us from even considering this
for the last three decades.
Les
*From:*Tony Przygienda <[email protected]>
*Sent:* Wednesday, December 4, 2024 12:35 AM
*To:* Peter Psenak (ppsenak) <[email protected]>
*Cc:* Shraddha Hegde <[email protected]>; Robert Raszuk
<[email protected]>; Tony Li <[email protected]>; lsr <[email protected]>
*Subject:* [Lsr] Re: Another counter-example
Valid point of view but there are other solutions possible to the
whole thing as well that don't precondition mesh-group node lift
up, if consensus passes and we start to work on details of the
necessary leaderless signalling in some framework that's part of
operational considerations then would be my take ...
thanks
-- tony
On Wed, Dec 4, 2024 at 9:25 AM Peter Psenak <[email protected]>
wrote:
Hi Shraddha,
so you define mesh-groups to be a separate flooding algorithm
itself, requiring all routers using them to be upgraded. By
the time you do that, you can also replace mesh-groups with
the distop on all routers and be done with it, instead of
trying to solve the coexistence of the two.
thanks,
Peter
On 04/12/2024 07:48, Shraddha Hegde wrote:
Hi Robert,
With dist-opt flood reduction running in leaderless mode
it is possible for the operator to run
Mesh-groups in some part of the network and introduce
distopt flooding in other part where needed. The nodes
configured with mesh-groups have to be upgraded to
advertise, they are running a different flood reduction
algorithm and the distopt algorithm will ensure the
neighbors of the Nodes running meshgroups will always
become reflooders and hence the CDS where distopt runs,
is ensured correct flooding behaviour.
Some networks have the mesh-groups deployed where it’s a
well defined part of the topology and reduces 50%
back-flooding with mesh-groups configured. Has been
deployed for many years and serving well. If an operator
wants to keep that config and introduce distopt in other
parts of the topology (during migration or otherwise),
It’s a very valid usecase and can be supported with
distopt algorithm.
Rgds
Shraddha
*Juniper Business Use Only*
*From:*Robert Raszuk <[email protected]>
<mailto:[email protected]>
*Sent:* 27 November 2024 15:58
*To:* Peter Psenak <[email protected]>
<mailto:[email protected]>
*Cc:* Tony Li <[email protected]> <mailto:[email protected]>;
Tony Przygienda <[email protected]>
<mailto:[email protected]>; lsr <[email protected]>
<mailto:[email protected]>
*Subject:* [Lsr] Re: Another counter-example
*[External Email. Be cautious of content]*
> you are talking about mixing the manual mesh group with
optimized flooding.
I am talking about an accidental mix (legacy
configuration at some nodes) not a planned one.
And you either auto detect it and disable the ability to
optimally flood or you push full responsibility to the
operator.
Thx,
R.
On Wed, Nov 27, 2024 at 11:16 AM Peter Psenak
<[email protected]> wrote:
Robert,
On 27/11/2024 10:32, Robert Raszuk wrote:
Peter,
My point was that this should be at least
mentioned in operational considerations section
if dynamic flooding is expected to work in mixed
networks where some nodes support new algorithm
and some do not your "regular flooding case".
you are talking about mixing the manual mesh group
with optimized flooding. I don't think we want to go
that path.
thanks,
Peter
On Wed, Nov 27, 2024 at 10:28 AM Peter Psenak
<[email protected]> wrote:
Robert,
On 27/11/2024 10:22, Robert Raszuk wrote:
Peter,
I am not sure if what Tony said is a
requirement or an observation.
> Note that combining routers that run
the elected optimized algorithm
> with routers that do run the regular
flooding is not a problem.
Note that static mesh groups can be
present today too and you can't assume
that it is either an optimized algorithm
or full flooding.
please do not compare apples with oranges.
Static mesh groups are manually configured
and if not done correctly can result in
broken flooding. What we are discussing here
is a dynamic flooding algorithm, not manual
flooding blocking.
thanks,
Peter
Thx,
R.
On Wed, Nov 27, 2024 at 9:58 AM Peter
Psenak
<[email protected]> wrote:
On 27/11/2024 00:18, Tony Li wrote:
> A distributed algorithm computing a
flooding topology must only
> operate upon nodes running the same
algorithm (and version). If
> multiple algorithms (and/or
versions) are running in the same
network,
> then any given algorithm and
version defines a subgraph and the
> algorithm can only optimize
flooding within its own subgraph. Legacy
> full flooding must be used between
subgraphs of different algorithms
> or versions.
This is a new requirement for the
flooding algorithm itself. This does
not exist with the existing leader
based election, as that guarantees
that only one optimized flooding
algorithm is ever present in the area.
Note that combining routers that run
the elected optimized algorithm
with routers that do run the regular
flooding is not a problem.
thanks,
Peter
_______________________________________________
Lsr mailing list -- [email protected]
To unsubscribe send an email to
[email protected]