Re: [Lsr] Comments regarding convergence regarding draft-cc-lsr-flooding-reduction and draft-li-lsr-dynamic-flooding

Huaimo Chen Fri, 01 Feb 2019 07:54:47 -0800

Hi Everyone,



Our answers/explanations to Tony’s comments/questions are in line below with 
prefix [HC] and in red.



Best Regards,

Huaimo



Hi all,



I wanted to address some of the points that Huaimo has raised.  I’ve taken the 
editorial liberty of reformatting some of Huaimo’s comments.



> 1.                  Flooding topology encoding

>

> The encoding in "draft-li-lsr-dynamic-flooding" comprises two parts. One part 
> is the encoding of the flooding topology using the TLVs of the paths 
> constituting the flooding topology; the second part is the encoding of the 
> mapping from the nodes in the area to their indexes. The TLVs use the indexes 
> of the nodes in the paths.



> There are two encodings in "draft-cc-lsr-flooding-reduction", each of

> which can be used to encode the flooding topology. One encoding is

> called "Links Encoding". The other is called "Block Encoding”.





Not distributing the encoding of the indices is an issue that needs to be 
addressed. Without an explicit encoding, all systems are forced to rely on the 
complete and area-wide synchronization of the entire LSDB. This seems like it 
is a significant risk. What happens if there is an expired LSP and one system 
has expunged it from the database but another has not? This would perturb the 
indices. If a new node has been added to the topology and its presence has not 
completed flooding, then the indices would also be perturbed.



[HC]  Area-wide synchronization is the pre-requisite for IGP to work correctly 
by default. When the network has some nodes having their LSDBs out of 
synchronization, in general, the issue on the indexes is even bigger if the 
mapping from the nodes to their indexes is generated by the leader of the area 
using its LSDB and is flooded to all the other nodes.



> The former encodes the links between a local node and its adjacent

> nodes using the compact indexes of these nodes. The whole flooding

> topology is represented by the links and nodes on it. This is like the normal 
> topology representation in an IGP, in which the topology is represented by 
> the links and nodes in an IGP area.  No mapping encoding and flooding is 
> needed. Every node creates and maintains the mapping from the nodes to their 
> indexes. It is simpler and faster for the node to have the mapping in this 
> way.





The use of two different encodings for the topology is highly unusual.



It is not at all clear about how it should be done. Ambiguity in a 
specification results in differing interpretations, that leads to complexity 
and interoperability errors. In this specific case, it would also seem to 
result in redundancy.  It would be better to have a single encoding.



[HC]  At this stage, we proposed two options in our draft. At last, one single 
encoding will be used.





The use of the “compact index” is not clear.  At 14 bits, it is clearly not 
large enough to support large areas. While we are generally not exceeding 16k 
systems today, that may grow. It wasn’t long ago that areas were practically 
confined to 1k nodes.  That’s exceeded now.



[HC] How did you get 14 bits?

The idea of compact node indexes or say a variable size (or length) of node 
indexes, is used in our links encoding and block encoding.



For the links from a local node to its adjacent nodes, the size (or length) of 
the indexes of these nodes (i.e., the local node and the adjacent nodes) is set 
to the maximum among the sizes of these node indexes. For example, suppose that 
these nodes have indexes 1, 126, 245, and 511. The sizes of their indexes are 
1, 7, 8, and 9 bits respectively. The maximum among 1, 7, 8, and 9 is 9 (bits). 
This maximum size (i.e., 9) is represented by an element, which is the 4 bits 
field. This element can be a byte, a word, or even a TLV if you want. After 
this element, these four node indexes (i.e., 1, 126, 245, and 511) are 
represented by four 9 bits fields, which contain 1, 126, 245, and 511 
respectively.



When a block of flooding topology is encoded in our block encoding, the size 
(or length) of the indexes of the nodes in this block is set to the maximum 
among the sizes of these node indexes.



When the value of 4 bits (from 0 to 15) is directly used to indicate the size 
of node indexes, it is big enough to support 32k (i.e., 2^15) nodes. When the 
links from a local node to its adjacent nodes are encoded, the size of these 
node indexes is indicated by the value of this 4 bits, which is the maximum 
among the sizes of these node indexes.



From section 6.2.1.1 in our draft, the size of node indexes is the value of 4 
bits field (called ENSI) plus 8. The range of 4 bits value is from 0 to 15. 
Thus the range of the size of node indexes is from 8 bits to 23 bits. Any node 
index from 0 to 2^23 can be represented. When (the value of the 4 bits field 
plus 8) is used to indicate the size of these node indexes, it is the maximum 
among the sizes of these node indexes or 8 in case where the maximum is less 
than 8.





A better way of encoding the indices is to make them variable length, depending 
on the number of nodes in the area (credit: Tony P.). In our draft, the 
explicit encoding of the mapping from system to index makes it very clear how 
many systems there are, and thus how many bits should be used in this encoding. 
 Please note that this optimization is not yet reflected in our draft.



[HC] We see that your better way is an application of our idea about “variable 
size (or length) of node indexes”.



Our idea can be used in the encoding in your draft in a few different ways.

In one way, a variable size (or length) of node indexes is used in a Path TLV. 
For all the nodes in this Path TLV, the size (or length) of the indexes of 
these nodes is set to the maximum among the sizes of these node indexes. The 
size (or length) of indexes is represented by an element, which can be a 4 bits 
field, a byte, a word or even a TLV.

In another way, a variable size (or length) of node indexes is used in an 
LSP/LSA for all the nodes contained in the Path TLVs in the LSP/LSA.

In a third way, a variable size (or length) of node indexes is used in the 
mapping for all the nodes in an area. The size (or length) of indexes is 
represented by an element. After this element, all the node indexes, including 
those in paths TLVs, are represented by the fields of the size indicated by the 
element.





> For example, one issue is that different types of messages (i.e., the 
> messages for the mapping, and the messages for the TLVs of the paths in the 
> flooding topology, depending on the mapping) for the flooding topology may be 
> out of order on a receiving router. This may cause incorrect/corrupted 
> flooding topology.





As I’ve explained, the ordering is irrelevant.



[HC] The ordering is critical. The TLVs for paths contain the indexes of the 
nodes in the paths. The indexes of these nodes are generated by the leader of 
the area, and the mapping from the nodes to the indexes is created by the 
leader. We can see that the TLVs (i.e., the indexes of the nodes in the TLVs 
for paths) depend on the mapping.



The ordering of the TLVs for the mapping from node to index is self describing. 
Each TLV carries the starting index for the contents of its TLV.



[HC] The ordering that you talked about is not the ordering we discussed. The 
former is the ordering of the TLVs in the mapping, i.e., the ordering of the 
same type of the TLVs for the mapping. Latter is ordering of the different 
types of TLVs. One type of TLVs is the TLVs for the paths containing the 
indexes of the nodes in the paths. The second type of TLVs is the TLVs for the 
mapping from nodes to indexes.



The ordering of arrival of various LSP segments is also irrelevant as the 
computation need only be deferred until all segments have arrived.  This is 
already commonly done as part of SPF scheduling. This is also a necessity for 
all proposals, as any computation with inconsistent segments is going to yield 
dubious results.





> 2.                  Fault tolerance to multiple failures

>

> An efficient method for fault tolerance to multiple failures on the

> flooding topology is very important to centralized flooding reduction.

> Without it, the network convergence will slow down significantly. In general, 
> the convergence time will be more than doubled comparing to normal 
> convergence when the flooding topology is split by multiple failures.





Multiple simultaneous failures are an extremely unlikely event. Routers are not 
designed to survive multiple concurrent hardware failures. Most topologies run 
at commercially viable loading levels cannot survive multiple critical link or 
node failures without manual intervention. What is critical is to survive a 
single failure efficiently. Our proposal is to compute a biconnected flooding 
topology. This protects against single failures, retaining a functioning 
flooding topology in the event of a failure. At that point, the system computes 
a new bi-connected flooding topology, protecting against further failures.



[HC] A network without flooding reduction deployment may survive multiple 
failures due to “reliable flooding” as we know that flooding is occurring on 
every link. After flooding reduction is deployed, if failure occurs on more 
than one links or/and nodes that are on the flooding topology, the network can 
not survive at all.



Carrying arbitrary alternate information to protect against multiple failures 
negates your point about encoding efficiency. Backup links are not a tractable 
solution.



[HC] There is no arbitrary alternate information to protect against multiple 
failures that is encoded or flooded in our proposal.



> The more time the network convergence takes, the more customers'

> traffic in the network gets lost in general. Thus after the centralized 
> flooding reduction without fault tolerance to multiple failures is deployed 
> in a network, some customers' traffic lose will be more than doubled when 
> multiple failures splitting the flooding topology occur comparing to the same 
> network without the flooding reduction.





If there are multiple failures, in our proposal, this can easily be detected. 
The flooding topology would clearly be partitioned. Systems on the edge of the 
partition can explicitly request temporary changes to the flooding topology, 
immediately restoring the connectivity of the flooding topology if that is 
possible.  Thus, the convergence time is going to be on the order of two LSP 
propagation times: one to detect the failure and another to repair it and 
converge.



[HC] From your description above, we can not see any details or description on 
how to restore the connectivity of the flooding topology.



The convergence time is the total of the times for finishing followings:

    1) the failures are flooded to the leader in the area through updated 
LSAs/LSPs,

    2) the leader computes a new flooding topology after receiving the 
LSAs/LSPs,

    3) the leader floods the new flooding topology to every other node in the 
area,

    4) every other node receives, decodes and installs the new flooding 
topology, and

5) some nodes resynchronize their LSDBs with their neighbors.



The time for the leader to compute a new flooding topology depends on the 
complexity of the algorithm. For computing a flooding topology from any network 
topology that survives any one failure, the algorithm is very complex in term 
of time in general.





Tree topologies are particularly vulnerable to failures since each and every 
link failure creates a partition of the flooding topology.



Repairing this in a distributed fashion is also not perceptibly better.  After 
a partition, it will require at least one LSP propagation time to detect the 
failure. Then execution of the algorithm, followed by another LSP propagation 
time before a complete SPF can be run.



[HC] Tree topologies we mentioned are just for examples.



> "draft-cc-lsr-flooding-reduction" has a light-weight method for providing 
> fault tolerance to multiple failures on the flooding topology. This method 
> makes sure that the network still converges fast when the multiple failures 
> happen. Thus the failures have almost minimal impact on the network 
> convergence in the network with the flooding reduction deployment. The 
> customers' traffic lose is significantly reduced (or almost minimized) when 
> the failures occur in the network with the flooding reduction.





Your proposal is to revert to almost full flooding.  You bound the flooding to 
three hops (maybe, this is not defined), but in a LS topology, that is the full 
topology.  This is not a solution.  If we could support full flooding, we would 
not have problem in the first place.



[HC] This is one optional solution. When multiple failures on the flooding 
topology happen, the nodes around the failure points revert to their normal (or 
full) flooding temporarily until a new flooding topology is built. This normal 
(or full) flooding on some nodes is only temporary for the emergency where 
multiple failures occur. After the new flooding topology is built on these 
nodes, they revert to flooding to the links on the flooding topology.

There is another solution in our draft, which is a light-weight method for 
providing fault tolerance to multiple failures on the flooding topology.





> "draft-li-lsr-dynamic-flooding" just mentions that the edges of split

> parts will do something after the split of the flooding topology happens. 
> However, it needs time to determine if a split of the flooding topology 
> occurs and to find backup paths to connect split parts. There is no 
> description on how to determine a split or find backup paths. This is not 
> useful for implementors or users.





It is described. Please see section 6.7.11. After a topology change, all nodes 
can examine the existing flooding topology and the faults that have occurred.  
If it is clear that the flooding topology has partitioned, then nodes that have 
adjacencies that are not reachable on the flooding topology make a temporary 
request to their adjacencies to add to the flooding topology.  This creates a 
temporary repair until the next flooding topology is distributed.



[HC] From both your draft and your description above, we can not see any 
details or description on how to determine a node has adjacencies that are not 
reachable on the flooding topology, and how does the node make a temporary 
request to their adjacencies to add to the flooding topology.

If the TLV with R bit in IS-IS Hello or FR-bit in OSPF Hello is used to request 
for adding a link to flooding topology temporarily (or say enabling temporary 
flooding over the link), then there is a big issue. For example, if the first 
Hello packet containing the TLV with R bit in IS-IS (or FR-bit in OSPF) gets 
lost, then the request to add/enable “temporary flooding” on the link will take 
more than a Hello interval (typically 10 seconds in OSPF). This is too long to 
accept!



> In addition, the algorithm for computing a flooding topology that can survive 
> any one link failure on the flooding topology (i.e., any one link failure on 
> the flooding topology does not split the flooding topology) will be very 
> complex, at least in terms of time. With a good method for fault tolerance, 
> the algorithm is not required to compute a flooding topology that can survive 
> any one link failure on the flooding topology.





Collapsing the network is not a 'good method'.



[HC] The above paragraph just was proposed as an optional solution, which works 
and reduces the complexity of the algorithm for computing a flooding topology, 
but is not mandatory for implementation.





> 3.                  A New Node and Link Addition

>

> Two different procedures for handling a new node and link addition to the 
> topology are described in the two drafts ("draft-li-lsr-dynamic-flooding" and 
> "draft-cc-lsr-flooding-reduction").

>

> The procedure in "draft-cc-lsr-flooding-reduction": After the new node (say 
> node A) and an existing node (say node B) establishes the adjacency over the 
> link, A and B add the link to the flooding topology temporarily until a new 
> flooding topology is built.





If A and B are already on the flooding topology, then this is redundant and 
harmful.



[HC] It is impossible that a new node A and a new link are on the flooding 
topology already.





> The procedure in "draft-li-lsr-dynamic-flooding": A new TLV with R bit in IIH 
> (IS-IS Hello), and FR-bit in OSPF Hello LLS Extended Options are defined. The 
> new node (say node A) and an existing node (say node B), send each other the 
> TLV with R bit in IS-IS Hello or FR-bit in OSPF Hello to request for adding 
> the link to flooding topology temporarily (or say enabling temporary flooding 
> over the link).





This only happens if the other node does not appear on the flooding topology.



[HC] The case that we talked is “A New Node and Link Addition” from the title 
above.

The new node is not on the flooding topology. This case applies to your 
procedure.



> The procedure in "draft-cc-lsr-flooding-reduction" is simpler and more 
> efficient. We can see that the two procedures try to achieve the same result 
> ("

> temporary flooding" on the link) in different approaches. When a new

> node a nd link is added to the topology/network, it is deterministic

> that the link needs to be added to the flooding topology temporarily

> by the two end node s of the link. There is no need for the end nodes

> to add/enable "temporary flooding" on the link through (new TLV in IIH

> or FR-Bit in OSPF Hello LLS) signaling over the link.





Automatically adding to the flooding topology risks collapsing the network, 
again contradicting the entire goal of reducing the flooding topology.



[HC] The results produced by the two procedures in the two drafts are the same.

The differences between the two procedures is that one procedure is simpler and 
more efficient than the other one.

There is no risk collapsing the network. It does not contradict the entire goal 
of reducing the flooding topology.

The new link is added to the flooding topology “temporarily” by the two end 
nodes of the link locally.

This link “temporarily” on the flooding topology is not flooded to the other 
nodes.

It is just for link states to be flooded “temporarily” over this link before a 
new flooding topology is built.





> During internal review of "draft-li-lsr-dynamic-flooding" we observed

> significant procedure issues, which are big issues for implementations

> and users.  For example in one issue, if the first Hello packet

> containing the TLV with R bit in IS-IS (or FR-bit in OSPF) gets lost,

> then to add/enable "temporary flooding" on the link will take more

> than a Hello interval (typically

> 10 seconds in OSPF). This is too long to accept!





Loss of a hello also implies that the adjacency will be delayed in coming up, 
so this is the same in both situations.



[HC] There is a case mentioned above, which is only for your draft. The case is 
under title “2. Fault tolerance to multiple failures”.





> 4.                  An Existing Link Addition

>

> The approach in "draft-cc-lsr-flooding-reduction": The end node sends

> its adjacent node over the adjacency the link states with changes it received 
> or originated during the period of time in which the flooding topology may 
> split.





This requires the implementation to retain state about every LSP on every link, 
even when it is down.



[HC] There is no such requirement in our approach.

Our approach defines a variable TimeOf1stLinkDown, which stores the time at 
time the first link on the flooding topology is down after a new flooding 
topology is built.

It uses the time information of an LSA/LSP in LSDB, the time at which a changed 
LSA/LSP is received or originated. If a current OSPF/IS-IS implementation does 
not have this time information, it can be easily added.

When adding an existing link/adjacency not on the flooding topology between two 
end nodes to the flooding topology after building another new topology, each 
end node of the link sends its adjacent node over the adjacency the changed 
LSA/LSPs from TimeOf1stLinkDown if there is a link down (i.e., 
TimeOf1stLinkDown != 0).

As we explained, this approach is more efficient.





> The approach in "draft-li-lsr-dynamic-flooding": A full LSDB

> resynchronization is requested over the existing adjacency/link between two 
> end nodes for possible flooding topology split which may cause LSDB out of 
> synchronization.





Resynchronization is a normal part of the protocol process at link up anyway.



[HC] This case is not about link up. It is about adding an existing 
link/adjacency not on the flooding topology between two end nodes to the 
flooding topology. The following is the original text for your reference:

“Two different approaches for adding an existing link/adjacency not on the 
flooding topology between two end nodes to the flooding topology are described 
in the two drafts (“draft-li-lsr-dynamic-flooding” and 
“draft-cc-lsr-flooding-reduction”).”

Even though the adjacency between two end nodes is already there and their 
LSDBs are the same in most of situations, your approach still requests for a 
full LSDB resynchronization, which consumes (or wastes) the power for 
processing IGP messages and the link bandwidth for transferring IGP messages. 
This is contradicting with the entire goal of the flooding reduction.



Regards,

Tony

_______________________________________________
Lsr mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/lsr

Re: [Lsr] Comments regarding convergence regarding draft-cc-lsr-flooding-reduction and draft-li-lsr-dynamic-flooding

Reply via email to