Re: [Lsr] Min Links for Multiple Failures on Flooding Topology

Huaimo Chen Thu, 25 Apr 2019 11:43:55 -0700

Hi Tony,

What is “the existing algorithm” that you have?
Is it the “rate limiting”? If so, can you describe the “rate limiting” 
algorithm in details?

Best Regards,
Huaimo
From: Tony Li [mailto:[email protected]] On Behalf Of [email protected]
Sent: Thursday, April 25, 2019 12:37 PM
To: Huaimo Chen <[email protected]>
Cc: [email protected]
Subject: Re: [Lsr] Min Links for Multiple Failures on Flooding Topology

Hi Huaimo,

Thank you, this is very clear.

It leaves me with one question: how is this better than what we have?

You’ve computed the backup paths: [R4, R3, R6, R7] and [R5, R2, R9, R8].  This 
results in enabling flooding on (R3, R6) and on (R5, R2), (R2, R9), (R9, R8).

Per the existing algorithm, the temporary additions would be (R3, R6) and (R2, 
R9).  As it has less flooding, this seems like a better solution.

Regards,
Tony

On Apr 25, 2019, at 7:26 AM, Huaimo Chen 
<[email protected]<mailto:[email protected]>> wrote:

Hi Tony,

Considering the different information about the link states in the different FT 
partitions caused by failures, attached (5 pages) contain an example with 
Figures, which illustrate and explain almost every step in details for 
computing a unique backup path for each of the two failures on the FT and 
enabling temporary flooding on the backup path. Would you mind indicating if 
any thing missing or incorrect?

In this example, the failures of two links R4-R7 and R5-R8 on the FT happen at 
the same time and split the FT into two partitions: partition A and partition 
B. Partition A contains R0, R1, R2, R3, R4, and R5. Partition B has R6, R7, R8, 
R9, R10, R11, and R12. Refer to Fig. 1.

Every node in partition A receives the link state (LS) from R4 indicating link 
from R4 to R7 down and the LS from R5 indicating link from R5 to R8 down. It 
has the FT and the real topology, shown in Fig. 2. There are two 
uni-directional links: link from R7 to R4, and link from R8 to R5. These 
uni-directional links will not be used in backup path computations.

Similarly in partition B, every node receives the LS from R7 indicating link 
from R7 to R4 down and the LS from R8 indicating link from R8 to R5 down. It 
has the FT and the real topology, shown in Fig. 3. There are two 
uni-directional links: link from R4 to R7, and link from R5 to R8. These 
uni-directional links will not be used in backup path computations. Note that 
Fig. 2 and Fig. 3 are equivalent if the uni-directional links are not used.

A unique backup path for link R4-R7 is computed and then enabled for temporary 
flooding.

In this example, the basic part of the algorithm below for computing a backup 
path is used, which is to obtain the minimum hop count path from Ri to Rj for 
link Ri-Rj. Like SPF, any link used in the path must be bi-directional.

At first, for the failure of a link Ri-Rj on the FT, a unique backup path for 
link Ri-Rj (assume Ri’s ID < Rj’s ID) is computed by each node in following 
steps:

  1.  Obtaining all the minimum hop count paths from Ri to Rj, wherein each 
minimum hop count path has a HC-FT value.
  2.  From the multiple paths that have the maximum HC-FT value if any, 
selecting the one with the links having smaller or smallest remote node ID 
along direction from destination Rj to source Ri.
And then, the backup path for link Ri-Rj is enabled for temporary flooding by 
node Rk on the path as follows:
Rk adds its local links on the backup path to the FT temporarily if they are 
not on the FT.

The backup path for link R4-R7 is R4-R3-R6-R7 (refer to Fig. 4). It is computed 
by R4, R3, R6 and R7 and other nodes. R3 and R4 in partition A compute the 
backup path using the topology in Fig. 2. R6 and R7 in partition B compute the 
backup path using the topology in Fig. 3.

The backup path for link R4-R7 is enabled for temporary flooding by R4,R3,R6 
and R7 that are on the backup path. R3 adds link R3-R6; and R6 adds link R6-R3 
to the FT temporarily. Note that link R4-R3 is already on the FT, R4 and R3 
will not add it to the FT again; link R6-R7 is already on the FT, R6 and R7 
will not add it to the FT again. Refer to Fig. 4.

Similarly, a unique backup path for link R5-R8 is computed and then enabled for 
temporary flooding.

The backup path for link R5-R8 is R5-R2-R9-R8 (refer to Fig. 5). It is computed 
by R5, R2, R9 and R8 and other nodes.

The backup path for link R5-R8 is enabled for temporary flooding by R5,R2,R9 
and R8 (refer to Fig. 5).

A unique backup path for link R4-R7, and a unique backup path for link R5-R8 
are computed and then enabled for temporary flooding.

    The FT split is repaired. Refer to Fig. 6, FT partition A and partition B 
are connected by the links added to the FT temporarily.

Best Regards,
Huaimo
From: Tony Li [mailto:[email protected]] On Behalf Of 
[email protected]<mailto:[email protected]>
Sent: Friday, April 19, 2019 1:20 PM
To: Huaimo Chen <[email protected]<mailto:[email protected]>>
Cc: [email protected]<mailto:[email protected]>
Subject: Re: [Lsr] Min Links for Multiple Failures on Flooding Topology

Hi Huaimo,

I don’t think you can just assume that the network topology is not damaged in 
some way.  After all, the FT partitioned because of real failures.  Your 
algorithm for computing backup paths relies on old information about the 
topology of other partitions.  That information may be out of date and 
incorrect. This can lead to the paths not being viable.

[HC]: In general, using the backup paths for the failures to repair the FT 
partition works well to some extend even though the nodes in one partition have 
some information about the topology that the nodes in another partition do not 
have.

At first, it can survive two or more failures. For example, it can repair the 
FT partition created by the failures of any two links on the FT. It can also 
repair the FT partition caused by the failures of any two links on the FT and 
the failure of any other link if the algorithm for computing backup paths is 
enhanced to consider using diverse links (i.e., the backup paths are computed 
in such a way that one backup path does not share any link with the other 
backup paths if possible).

Well, I’m still not seeing how this works.  AFAICT, when there is a partition, 
your proposal is to compute a shortest path between the previously connected 
nodes using link state information that is now out of date.  What you compute 
is a multi-link path, tho it’s not clear at all how this is enabled.  Since the 
FT is down, there is no way to signal to the other half of the partition about 
what links are requested.

At best, it seems like you’re having one partition-edge router enable one link 
towards the other partition.

Secondly, it helps the convergence. The backup paths for a failure are created 
by the nodes close (or say local) to the failure and connect the two 
partitions. After this local repair of the FT partition, the link states in the 
two partitions are synchronized, and the topology is converged. Locally 
repairing the FT partition is faster than the repairing by the area leader.

The area leader cannot repair the partition, since the FT cannot be propagated 
across the partition.  This is why enabling links across the partition edge is 
necessary.

It is not clear that your backup path computation (an SPF?) is any faster (or 
slower) than determining whether a neighbor is in a separate partition.

The network topology is damaged.  This is considered in the algorithm for 
computing backup paths.

Only partially.  Your algorithm does not know what the damage is in the other 
half of the partition.  There is no way to get it this information. As a 
result, the backup path that’s computed may traverse nodes and links in the 
other half of the partition that are no longer functional. It may also 
completely ignore links and nodes that could heal the partition.

Every node uses its LSDB for computing backup paths if needed. The FT is 
portioned, but the real network topology is not, and is used. For the 
information about a failure that is on one partition and not on another 
partition, if the part damaged by this failure is not used in any other backup 
paths, then there is no problem; otherwise (i.e., it is used in some backup 
paths), if the damaged part is used only in one backup path, the backup paths 
can survive the damaged part if that one backup path does not share any link 
with the other backup paths.

It seems to me that this leads to enabling flooding throughout the entire 
topology, leading us back to cascade failure.

Again, how can you consider any new failures in the computation of the backup 
paths? You have no way of getting information since the FT partitioned.

[HC]: The information about the new failures will flood through the backup 
paths for the old failures if the backup paths are created. Refer to the 
explanation above.

This reasoning is circular.  You have still not shown that the backup paths 
will repair the failures.

[HC]: It seems that using the backup paths for the failures to repair the FT 
partition may make the convergence faster. Refer to the second point in the 
first explanation above.
It may be better to use both the backup paths for the failures and the rate 
limiting. The former repairs the partitions locally, and the latter helps the 
former to work better. Thus the convergence will be faster.

You have not shown that the backup paths will provide any benefit at all, much 
less improve convergence.  First and foremost, we need an algorithm that will 
lead to partition repair.

We still need you to provide a specific, detailed proposal that we all agree 
will heal the partition.

Instead of continuing to argue about the merits of your proposal at a high 
level, I suggest we return to specific examples.  I think that you’ll find that 
as we work through them that the greedy algorithm described in the draft is 
very difficult to improve upon in the general case.

Regards,
Tony

<Backuppath4-FT-split-diff-lsdbs-2failures.pdf>

_______________________________________________
Lsr mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/lsr

Re: [Lsr] Min Links for Multiple Failures on Flooding Topology

Reply via email to