Re: [Lsr] Min Links for Multiple Failures on Flooding Topology

tony . li Thu, 25 Apr 2019 09:37:01 -0700

Hi Huaimo,

Thank you, this is very clear.


It leaves me with one question: how is this better than what we have?

You’ve computed the backup paths: [R4, R3, R6, R7] and [R5, R2, R9, R8].  This 
results in enabling flooding on (R3, R6) and on (R5, R2), (R2, R9), (R9, R8).

Per the existing algorithm, the temporary additions would be (R3, R6) and (R2, 
R9).  As it has less flooding, this seems like a better solution.

Regards,
Tony


> On Apr 25, 2019, at 7:26 AM, Huaimo Chen <[email protected]> wrote:
> 
> Hi Tony,
>  
> Considering the different information about the link states in the different 
> FT partitions caused by failures, attached (5 pages) contain an example with 
> Figures, which illustrate and explain almost every step in details for 
> computing a unique backup path for each of the two failures on the FT and 
> enabling temporary flooding on the backup path. Would you mind indicating if 
> any thing missing or incorrect?
>  
> In this example, the failures of two links R4-R7 and R5-R8 on the FT happen 
> at the same time and split the FT into two partitions: partition A and 
> partition B. Partition A contains R0, R1, R2, R3, R4, and R5. Partition B has 
> R6, R7, R8, R9, R10, R11, and R12. Refer to Fig. 1.
>  
> Every node in partition A receives the link state (LS) from R4 indicating 
> link from R4 to R7 down and the LS from R5 indicating link from R5 to R8 
> down. It has the FT and the real topology, shown in Fig. 2. There are two 
> uni-directional links: link from R7 to R4, and link from R8 to R5. These 
> uni-directional links will not be used in backup path computations.
>  
> Similarly in partition B, every node receives the LS from R7 indicating link 
> from R7 to R4 down and the LS from R8 indicating link from R8 to R5 down. It 
> has the FT and the real topology, shown in Fig. 3. There are two 
> uni-directional links: link from R4 to R7, and link from R5 to R8. These 
> uni-directional links will not be used in backup path computations. Note that 
> Fig. 2 and Fig. 3 are equivalent if the uni-directional links are not used.
>  
> A unique backup path for link R4-R7 is computed and then enabled for 
> temporary flooding.
>  
> In this example, the basic part of the algorithm below for computing a backup 
> path is used, which is to obtain the minimum hop count path from Ri to Rj for 
> link Ri-Rj. Like SPF, any link used in the path must be bi-directional.
>  
> At first, for the failure of a link Ri-Rj on the FT, a unique backup path for 
> link Ri-Rj (assume Ri’s ID < Rj’s ID) is computed by each node in following 
> steps:
> Obtaining all the minimum hop count paths from Ri to Rj, wherein each minimum 
> hop count path has a HC-FT value.
> From the multiple paths that have the maximum HC-FT value if any, selecting 
> the one with the links having smaller or smallest remote node ID along 
> direction from destination Rj to source Ri.
> And then, the backup path for link Ri-Rj is enabled for temporary flooding by 
> node Rk on the path as follows:
> Rk adds its local links on the backup path to the FT temporarily if they are 
> not on the FT.
>  
>  
>  
> The backup path for link R4-R7 is R4-R3-R6-R7 (refer to Fig. 4). It is 
> computed by R4, R3, R6 and R7 and other nodes. R3 and R4 in partition A 
> compute the backup path using the topology in Fig. 2. R6 and R7 in partition 
> B compute the backup path using the topology in Fig. 3.
>  
> The backup path for link R4-R7 is enabled for temporary flooding by R4,R3,R6 
> and R7 that are on the backup path. R3 adds link R3-R6; and R6 adds link 
> R6-R3 to the FT temporarily. Note that link R4-R3 is already on the FT, R4 
> and R3 will not add it to the FT again; link R6-R7 is already on the FT, R6 
> and R7 will not add it to the FT again. Refer to Fig. 4.
>  
> Similarly, a unique backup path for link R5-R8 is computed and then enabled 
> for temporary flooding.
>  
> The backup path for link R5-R8 is R5-R2-R9-R8 (refer to Fig. 5). It is 
> computed by R5, R2, R9 and R8 and other nodes.
>  
> The backup path for link R5-R8 is enabled for temporary flooding by R5,R2,R9 
> and R8 (refer to Fig. 5).
>  
> A unique backup path for link R4-R7, and a unique backup path for link R5-R8 
> are computed and then enabled for temporary flooding.
>  
>     The FT split is repaired. Refer to Fig. 6, FT partition A and partition B 
> are connected by the links added to the FT temporarily.
>  
> Best Regards,
> Huaimo
> From: Tony Li [mailto:[email protected]] On Behalf Of [email protected]
> Sent: Friday, April 19, 2019 1:20 PM
> To: Huaimo Chen <[email protected]>
> Cc: [email protected]
> Subject: Re: [Lsr] Min Links for Multiple Failures on Flooding Topology
>  
>  
> Hi Huaimo,
>  
> I don’t think you can just assume that the network topology is not damaged in 
> some way.  After all, the FT partitioned because of real failures.  Your 
> algorithm for computing backup paths relies on old information about the 
> topology of other partitions.  That information may be out of date and 
> incorrect. This can lead to the paths not being viable.
>  
> [HC]: In general, using the backup paths for the failures to repair the FT 
> partition works well to some extend even though the nodes in one partition 
> have some information about the topology that the nodes in another partition 
> do not have. 
>  
> At first, it can survive two or more failures. For example, it can repair the 
> FT partition created by the failures of any two links on the FT. It can also 
> repair the FT partition caused by the failures of any two links on the FT and 
> the failure of any other link if the algorithm for computing backup paths is 
> enhanced to consider using diverse links (i.e., the backup paths are computed 
> in such a way that one backup path does not share any link with the other 
> backup paths if possible). 
>  
>  
> Well, I’m still not seeing how this works.  AFAICT, when there is a 
> partition, your proposal is to compute a shortest path between the previously 
> connected nodes using link state information that is now out of date.  What 
> you compute is a multi-link path, tho it’s not clear at all how this is 
> enabled.  Since the FT is down, there is no way to signal to the other half 
> of the partition about what links are requested.  
>  
> At best, it seems like you’re having one partition-edge router enable one 
> link towards the other partition.
>  
>  
> Secondly, it helps the convergence. The backup paths for a failure are 
> created by the nodes close (or say local) to the failure and connect the two 
> partitions. After this local repair of the FT partition, the link states in 
> the two partitions are synchronized, and the topology is converged. Locally 
> repairing the FT partition is faster than the repairing by the area leader.
>  
>  
> The area leader cannot repair the partition, since the FT cannot be 
> propagated across the partition.  This is why enabling links across the 
> partition edge is necessary.
>  
> It is not clear that your backup path computation (an SPF?) is any faster (or 
> slower) than determining whether a neighbor is in a separate partition.
>  
>  
> The network topology is damaged.  This is considered in the algorithm for 
> computing backup paths.
>  
>  
> Only partially.  Your algorithm does not know what the damage is in the other 
> half of the partition.  There is no way to get it this information. As a 
> result, the backup path that’s computed may traverse nodes and links in the 
> other half of the partition that are no longer functional. It may also 
> completely ignore links and nodes that could heal the partition.
>  
> 
> 
> Every node uses its LSDB for computing backup paths if needed. The FT is 
> portioned, but the real network topology is not, and is used. For the 
> information about a failure that is on one partition and not on another 
> partition, if the part damaged by this failure is not used in any other 
> backup paths, then there is no problem; otherwise (i.e., it is used in some 
> backup paths), if the damaged part is used only in one backup path, the 
> backup paths can survive the damaged part if that one backup path does not 
> share any link with the other backup paths. 
>  
>  
> It seems to me that this leads to enabling flooding throughout the entire 
> topology, leading us back to cascade failure.
>  
>  
> Again, how can you consider any new failures in the computation of the backup 
> paths? You have no way of getting information since the FT partitioned.
>  
> [HC]: The information about the new failures will flood through the backup 
> paths for the old failures if the backup paths are created. Refer to the 
> explanation above.
>  
>  
> This reasoning is circular.  You have still not shown that the backup paths 
> will repair the failures.
>  
> 
> 
> [HC]: It seems that using the backup paths for the failures to repair the FT 
> partition may make the convergence faster. Refer to the second point in the 
> first explanation above.
> It may be better to use both the backup paths for the failures and the rate 
> limiting. The former repairs the partitions locally, and the latter helps the 
> former to work better. Thus the convergence will be faster.
>  
>  
> You have not shown that the backup paths will provide any benefit at all, 
> much less improve convergence.  First and foremost, we need an algorithm that 
> will lead to partition repair.  
>  
> We still need you to provide a specific, detailed proposal that we all agree 
> will heal the partition.
>  
> Instead of continuing to argue about the merits of your proposal at a high 
> level, I suggest we return to specific examples.  I think that you’ll find 
> that as we work through them that the greedy algorithm described in the draft 
> is very difficult to improve upon in the general case.
>  
> Regards,
> Tony
>  
>  
> <Backuppath4-FT-split-diff-lsdbs-2failures.pdf>

_______________________________________________
Lsr mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/lsr

Re: [Lsr] Min Links for Multiple Failures on Flooding Topology

Reply via email to