Hi Tony,
My explanations are inline below with prefix [HC].
From: Tony Li [mailto:[email protected]] On Behalf Of [email protected]
Sent: Monday, April 15, 2019 11:37 AM
To: Huaimo Chen <[email protected]>
Cc: [email protected]
Subject: Re: [Lsr] Min Links for Multiple Failures on Flooding Topology
Hi Huaimo,
Consider the case that the multiple failures occur at almost the same time and
there is no other failure following these multiple failures shortly. In this
case, the backup paths will connect the live end nodes of the failures on the
FT, and the FT partition is fixed. Note that we assume that the real network
topology is not split. So there will be working backup paths that connect the
FT partitions.
I don’t think you can just assume that the network topology is not damaged in
some way. After all, the FT partitioned because of real failures. Your
algorithm for computing backup paths relies on old information about the
topology of other partitions. That information may be out of date and
incorrect. This can lead to the paths not being viable.
[HC]: In general, using the backup paths for the failures to repair the FT
partition works well to some extend even though the nodes in one partition have
some information about the topology that the nodes in another partition do not
have.
At first, it can survive two or more failures. For example, it can repair the
FT partition created by the failures of any two links on the FT. It can also
repair the FT partition caused by the failures of any two links on the FT and
the failure of any other link if the algorithm for computing backup paths is
enhanced to consider using diverse links (i.e., the backup paths are computed
in such a way that one backup path does not share any link with the other
backup paths if possible).
Secondly, it helps the convergence. The backup paths for a failure are created
by the nodes close (or say local) to the failure and connect the two
partitions. After this local repair of the FT partition, the link states in the
two partitions are synchronized, and the topology is converged. Locally
repairing the FT partition is faster than the repairing by the area leader.
Moreover, if one extra backup path for a failure is computed and enabled for
temporary flooding, the FT partition created by one more failure can be
repaired.
The network topology is damaged. This is considered in the algorithm for
computing backup paths. Every node uses its LSDB for computing backup paths if
needed. The FT is portioned, but the real network topology is not, and is used.
For the information about a failure that is on one partition and not on another
partition, if the part damaged by this failure is not used in any other backup
paths, then there is no problem; otherwise (i.e., it is used in some backup
paths), if the damaged part is used only in one backup path, the backup paths
can survive the damaged part if that one backup path does not share any link
with the other backup paths.
For the case that the multiple failures occur at almost the same time and then
other failures happen following the (old) multiple failures shortly, we can
consider all these failures (i.e., the old multiple failures and the other
failures including destination node failure) as new multiple failures. For
these new multiple failures, the backup paths for all these failures will
connect the live end nodes of the failures on the FT, and the FT partition is
fixed.
Again, how can you consider any new failures in the computation of the backup
paths? You have no way of getting information since the FT partitioned.
[HC]: The information about the new failures will flood through the backup
paths for the old failures if the backup paths are created. Refer to the
explanation above.
It’s true that just adding one link may not heal the partition. This is
exactly why we feel we need to iterate. Each iteration increases the size of
the partition (or decreases the cut set, depending on how you want to look at
it) and because the graph is finite, we are guaranteed to converge.
The convergence in this way may take a long time.
Agreed. However, the alternatives that we have on the table right now are (a)
take several iterations, or (b) turn on many links and risk a cascade failure.
Each iteration takes a time Ti. If we need n (n > 1) iterations, then the total
time for convergence is n times Ti. For centralized flooding reduction, Ti is
the time including three parts: 1) for the link states (i.e., LSPs or LSAs)
representing some of the multiple failures to travel to the area leader in the
network over the FT that is partitioned, 2) the leader to compute and send a
new FT, and 3) the other nodes (i.e., non area leader nodes) receive and build
the new FT.
We agree that it is linear in the time taken to add new nodes to the topology.
However, due to rate limiting, I suspect that the overall time is more
constrained by the rate limiting.
After a multiple failure, all nodes can evaluate the state of their adjacencies
against the current, existing, and now partitioned FT. Information about the
partition has been flooded using the current FT, but only within the
partitions. Any node with an adjacency outside of its partition can enable
temporary flooding to one neighbor and begin rate limiting.
While rate limiting is in progress, a new area leader election and FT
computation can happen within the partition. This may or may not yield LSDB
updates. Similarly, the temporary addition to the flooding topology may or may
not yield LSDB updates.
Whether the area leader election and FT computation is faster than the rate
limiting is unknown, since both are implementation dependent.
[HC]: It seems that using the backup paths for the failures to repair the FT
partition may make the convergence faster. Refer to the second point in the
first explanation above.
It may be better to use both the backup paths for the failures and the rate
limiting. The former repairs the partitions locally, and the latter helps the
former to work better. Thus the convergence will be faster.
Best Regards,
Huaimo
Tony
_______________________________________________
Lsr mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/lsr