Hi Tony,
In summary for multiple failures, two issues below in
draft-li-lsr-dynamyic-flooding are discussed:
1) how to determine the current flooding topology is split; and
2) how to repair/connect the flooding topology split.
For the first issue, the discussions are still going on.
For the second issue, repairing/connecting the flooding topology split through
Hello protocol extensions does not work. When a “backup path”/connection of
multiple hops is needed to connect/repair the flooding topology split, Hello
can not go beyond one hop, thus can not repair the flooding topology split in
this case.
>From: Tony Li [mailto:[email protected]] On Behalf Of [email protected]
>Sent: Wednesday, March 6, 2019 10:45 AM
>To: Huaimo Chen <[email protected]>
>Cc: Christian Hopps <[email protected]>; [email protected]; [email protected];
>[email protected]
>Subject: Multiple failures in Dynamic Flooding
>
>Hi Huaimo,
>
>>> I’m sorry that you don’t find it useful. Determining the split is trivial:
>>> when you receive an IIH,
>>> it has a system ID of the another system in it. If that other system is not
>>> currently part of the
>>> flooding topology, then it is quite clear that it is disconnected from the
>>> flooding topology.
>>> Repairing the split is done by enabling temporary flooding on the new link.
>>For an adjacency between two nodes is up, the Hello packets exchanged between
>>them will not change node/system IDs in them.
>>How do you determine that other system is not currently part of the flooding
>>topology?
>The IIH includes the system ID. See ISO 10589 v2, section 9.7, field “source
>Id”. The local system will have
>a copy of the flooding topology and can easily see if the neighbor was present
>as of the last FT computation. If not, then it should be
>added (modulo rate limiting). The local system can also examine it’s own LSDB.
> If there is no LSP for the neighbor, then it would seem
>highly likely that there is a disconnect and the neighbor should again be
>added (modulo rate limiting).
>We are not requiring it, but a system could also do a more extensive
>computation and compare the links between itself and the neighbor
>by tracing the path in the FT and then confirming that each link is up in the
>LSDB.
It normally takes a long time such as more than ten minutes to age out and
remove an LSP/LSA for the neighbor from the LSDB even though the neighbor is
disconnected physically.
How can you decide quickly in tens of milliseconds that the flooding topology
is disconnected?
>>> There is an issue here that we have not yet resolved, which is the rate
>>> that new links should be
>>> temporarily added to the flooding topology. Some believe that adding any
>>> new link is the
>>> correct thing to do as it minimizes the recovery time. Others feel that
>>> enabling too many links
>>> could cause a flooding collapse, so link addition should be highly
>>> constrained. We are still
>>> discussing this and invite the WG’s opinions.
>>The issue is resolved by the solutions in draft-cc-lsr-flooding-reduction.
One solution is below, where the given distance can be adjusted/configured.
If we want every node to flood on all its links, we let the given
>>distance to a big number. If we want the nodes within 2 hops to a failure
>>to flood on all their links, we set the given distance to 2.
“In one way, when two or more failures on the current flooding
> >topology occur almost in the same time, each of the nodes within a
> >given distance (such as 3 hops) to a failure point, floods the link
> >state (LS) that it receives to all the links (except for the one from
which the LS is received) until a new flooding topology is built.”
>As we have discussed, this is not a solution. In fact, this is more dangerous
>than anything else that has been proposed and
>seems highly likely to trigger a cascade failure. You are enabling full
>flooding for many nodes. In dense topologies, even
>a radius of 3 is very high. For example, in a LS topology, a radius of 3 is
>sufficient to enable full flooding throughout the
>entire topology. If that were stable, we would not need Dynamic Flooding at
>all.
This full flooding is enabled only for a very short time.
How do you get that this is more dangerous than anything else and seems highly
likely to trigger a cascade failure? Can you give some explanations in details?
>>Another solution is just adding minimum links temporarily on the flooding
>>topology to repair the split flooding topology until a new flooding topology
>>is built.
>Agreed. Which links constitute the minimum? In a general topology, with
>arbitrary failures that are not distributed globally,
>how do we make a distributed decision about which links to enable? This is the
>problem that we are trying to solve. And
>we have no oracle to tell us The Right Answer.
We can discuss this after the first method is discussed.
Best Regards,
Huaimo
>Regards,
>Tony
_______________________________________________
Lsr mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/lsr