Hi Huaimo,

> Consider the case that the multiple failures occur at almost the same time 
> and there is no other failure following these multiple failures shortly. In 
> this case, the backup paths will connect the live end nodes of the failures 
> on the FT, and the FT partition is fixed. Note that we assume that the real 
> network topology is not split. So there will be working backup paths that 
> connect the FT partitions.


I don’t think you can just assume that the network topology is not damaged in 
some way.  After all, the FT partitioned because of real failures.  Your 
algorithm for computing backup paths relies on old information about the 
topology of other partitions.  That information may be out of date and 
incorrect. This can lead to the paths not being viable.


> For the case that the multiple failures occur at almost the same time and 
> then other failures happen following the (old) multiple failures shortly, we 
> can consider all these failures (i.e., the old multiple failures and the 
> other failures including destination node failure) as new multiple failures. 
> For these new multiple failures, the backup paths for all these failures will 
> connect the live end nodes of the failures on the FT, and the FT partition is 
> fixed.


Again, how can you consider any new failures in the computation of the backup 
paths? You have no way of getting information since the FT partitioned.


> It’s true that just adding one link may not heal the partition.  This is 
> exactly why we feel we need to iterate.  Each iteration increases the size of 
> the partition (or decreases the cut set, depending on how you want to look at 
> it) and because the graph is finite, we are guaranteed to converge.
>  
> The convergence in this way may take a long time.


Agreed.  However, the alternatives that we have on the table right now are (a) 
take several iterations, or (b) turn on many links and risk a cascade failure.


> Each iteration takes a time Ti. If we need n (n > 1) iterations, then the 
> total time for convergence is n times Ti. For centralized flooding reduction, 
> Ti is the time including three parts: 1) for the link states (i.e., LSPs or 
> LSAs) representing some of the multiple failures to travel to the area leader 
> in the network over the FT that is partitioned, 2) the leader to compute and 
> send a new FT, and 3) the other nodes  (i.e., non area leader nodes) receive 
> and build the new FT. 


We agree that it is linear in the time taken to add new nodes to the topology.  
However, due to rate limiting, I suspect that the overall time is more 
constrained by the rate limiting.  

After a multiple failure, all nodes can evaluate the state of their adjacencies 
against the current, existing, and now partitioned FT.  Information about the 
partition has been flooded using the current FT, but only within the 
partitions.  Any node with an adjacency outside of its partition can enable 
temporary flooding to one neighbor and begin rate limiting.

While rate limiting is in progress, a new area leader election and FT 
computation can happen within the partition.  This may or may not yield LSDB 
updates.  Similarly, the temporary addition to the flooding topology may or may 
not yield LSDB updates. 

Whether the area leader election and FT computation is faster than the rate 
limiting is unknown, since both are implementation dependent.

Tony


_______________________________________________
Lsr mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/lsr

Reply via email to