Hi Everyone,
Our answers/explanations to Tony’s comments/questions are in line below with
prefix [HC] and in red.
Best Regards,
Huaimo
Hi all,
I wanted to address some of the points that Huaimo has raised. I’ve taken the
editorial liberty of reformatting some of Huaimo’s comments.
> 1. Flooding topology encoding
>
> The encoding in "draft-li-lsr-dynamic-flooding" comprises two parts. One part
> is the encoding of the flooding topology using the TLVs of the paths
> constituting the flooding topology; the second part is the encoding of the
> mapping from the nodes in the area to their indexes. The TLVs use the indexes
> of the nodes in the paths.
> There are two encodings in "draft-cc-lsr-flooding-reduction", each of
> which can be used to encode the flooding topology. One encoding is
> called "Links Encoding". The other is called "Block Encoding”.
Not distributing the encoding of the indices is an issue that needs to be
addressed. Without an explicit encoding, all systems are forced to rely on the
complete and area-wide synchronization of the entire LSDB. This seems like it
is a significant risk. What happens if there is an expired LSP and one system
has expunged it from the database but another has not? This would perturb the
indices. If a new node has been added to the topology and its presence has not
completed flooding, then the indices would also be perturbed.
[HC] Area-wide synchronization is the pre-requisite for IGP to work correctly
by default. When the network has some nodes having their LSDBs out of
synchronization, in general, the issue on the indexes is even bigger if the
mapping from the nodes to their indexes is generated by the leader of the area
using its LSDB and is flooded to all the other nodes.
> The former encodes the links between a local node and its adjacent
> nodes using the compact indexes of these nodes. The whole flooding
> topology is represented by the links and nodes on it. This is like the normal
> topology representation in an IGP, in which the topology is represented by
> the links and nodes in an IGP area. No mapping encoding and flooding is
> needed. Every node creates and maintains the mapping from the nodes to their
> indexes. It is simpler and faster for the node to have the mapping in this
> way.
The use of two different encodings for the topology is highly unusual.
It is not at all clear about how it should be done. Ambiguity in a
specification results in differing interpretations, that leads to complexity
and interoperability errors. In this specific case, it would also seem to
result in redundancy. It would be better to have a single encoding.
[HC] At this stage, we proposed two options in our draft. At last, one single
encoding will be used.
The use of the “compact index” is not clear. At 14 bits, it is clearly not
large enough to support large areas. While we are generally not exceeding 16k
systems today, that may grow. It wasn’t long ago that areas were practically
confined to 1k nodes. That’s exceeded now.
[HC] How did you get 14 bits?
The idea of compact node indexes or say a variable size (or length) of node
indexes, is used in our links encoding and block encoding.
For the links from a local node to its adjacent nodes, the size (or length) of
the indexes of these nodes (i.e., the local node and the adjacent nodes) is set
to the maximum among the sizes of these node indexes. For example, suppose that
these nodes have indexes 1, 126, 245, and 511. The sizes of their indexes are
1, 7, 8, and 9 bits respectively. The maximum among 1, 7, 8, and 9 is 9 (bits).
This maximum size (i.e., 9) is represented by an element, which is the 4 bits
field. This element can be a byte, a word, or even a TLV if you want. After
this element, these four node indexes (i.e., 1, 126, 245, and 511) are
represented by four 9 bits fields, which contain 1, 126, 245, and 511
respectively.
When a block of flooding topology is encoded in our block encoding, the size
(or length) of the indexes of the nodes in this block is set to the maximum
among the sizes of these node indexes.
When the value of 4 bits (from 0 to 15) is directly used to indicate the size
of node indexes, it is big enough to support 32k (i.e., 2^15) nodes. When the
links from a local node to its adjacent nodes are encoded, the size of these
node indexes is indicated by the value of this 4 bits, which is the maximum
among the sizes of these node indexes.
From section 6.2.1.1 in our draft, the size of node indexes is the value of 4
bits field (called ENSI) plus 8. The range of 4 bits value is from 0 to 15.
Thus the range of the size of node indexes is from 8 bits to 23 bits. Any node
index from 0 to 2^23 can be represented. When (the value of the 4 bits field
plus 8) is used to indicate the size of these node indexes, it is the maximum
among the sizes of these node indexes or 8 in case where the maximum is less
than 8.
A better way of encoding the indices is to make them variable length, depending
on the number of nodes in the area (credit: Tony P.). In our draft, the
explicit encoding of the mapping from system to index makes it very clear how
many systems there are, and thus how many bits should be used in this encoding.
Please note that this optimization is not yet reflected in our draft.
[HC] We see that your better way is an application of our idea about “variable
size (or length) of node indexes”.
Our idea can be used in the encoding in your draft in a few different ways.
In one way, a variable size (or length) of node indexes is used in a Path TLV.
For all the nodes in this Path TLV, the size (or length) of the indexes of
these nodes is set to the maximum among the sizes of these node indexes. The
size (or length) of indexes is represented by an element, which can be a 4 bits
field, a byte, a word or even a TLV.
In another way, a variable size (or length) of node indexes is used in an
LSP/LSA for all the nodes contained in the Path TLVs in the LSP/LSA.
In a third way, a variable size (or length) of node indexes is used in the
mapping for all the nodes in an area. The size (or length) of indexes is
represented by an element. After this element, all the node indexes, including
those in paths TLVs, are represented by the fields of the size indicated by the
element.
> For example, one issue is that different types of messages (i.e., the
> messages for the mapping, and the messages for the TLVs of the paths in the
> flooding topology, depending on the mapping) for the flooding topology may be
> out of order on a receiving router. This may cause incorrect/corrupted
> flooding topology.
As I’ve explained, the ordering is irrelevant.
[HC] The ordering is critical. The TLVs for paths contain the indexes of the
nodes in the paths. The indexes of these nodes are generated by the leader of
the area, and the mapping from the nodes to the indexes is created by the
leader. We can see that the TLVs (i.e., the indexes of the nodes in the TLVs
for paths) depend on the mapping.
The ordering of the TLVs for the mapping from node to index is self describing.
Each TLV carries the starting index for the contents of its TLV.
[HC] The ordering that you talked about is not the ordering we discussed. The
former is the ordering of the TLVs in the mapping, i.e., the ordering of the
same type of the TLVs for the mapping. Latter is ordering of the different
types of TLVs. One type of TLVs is the TLVs for the paths containing the
indexes of the nodes in the paths. The second type of TLVs is the TLVs for the
mapping from nodes to indexes.
The ordering of arrival of various LSP segments is also irrelevant as the
computation need only be deferred until all segments have arrived. This is
already commonly done as part of SPF scheduling. This is also a necessity for
all proposals, as any computation with inconsistent segments is going to yield
dubious results.
> 2. Fault tolerance to multiple failures
>
> An efficient method for fault tolerance to multiple failures on the
> flooding topology is very important to centralized flooding reduction.
> Without it, the network convergence will slow down significantly. In general,
> the convergence time will be more than doubled comparing to normal
> convergence when the flooding topology is split by multiple failures.
Multiple simultaneous failures are an extremely unlikely event. Routers are not
designed to survive multiple concurrent hardware failures. Most topologies run
at commercially viable loading levels cannot survive multiple critical link or
node failures without manual intervention. What is critical is to survive a
single failure efficiently. Our proposal is to compute a biconnected flooding
topology. This protects against single failures, retaining a functioning
flooding topology in the event of a failure. At that point, the system computes
a new bi-connected flooding topology, protecting against further failures.
[HC] A network without flooding reduction deployment may survive multiple
failures due to “reliable flooding” as we know that flooding is occurring on
every link. After flooding reduction is deployed, if failure occurs on more
than one links or/and nodes that are on the flooding topology, the network can
not survive at all.
Carrying arbitrary alternate information to protect against multiple failures
negates your point about encoding efficiency. Backup links are not a tractable
solution.
[HC] There is no arbitrary alternate information to protect against multiple
failures that is encoded or flooded in our proposal.
> The more time the network convergence takes, the more customers'
> traffic in the network gets lost in general. Thus after the centralized
> flooding reduction without fault tolerance to multiple failures is deployed
> in a network, some customers' traffic lose will be more than doubled when
> multiple failures splitting the flooding topology occur comparing to the same
> network without the flooding reduction.
If there are multiple failures, in our proposal, this can easily be detected.
The flooding topology would clearly be partitioned. Systems on the edge of the
partition can explicitly request temporary changes to the flooding topology,
immediately restoring the connectivity of the flooding topology if that is
possible. Thus, the convergence time is going to be on the order of two LSP
propagation times: one to detect the failure and another to repair it and
converge.
[HC] From your description above, we can not see any details or description on
how to restore the connectivity of the flooding topology.
The convergence time is the total of the times for finishing followings:
1) the failures are flooded to the leader in the area through updated
LSAs/LSPs,
2) the leader computes a new flooding topology after receiving the
LSAs/LSPs,
3) the leader floods the new flooding topology to every other node in the
area,
4) every other node receives, decodes and installs the new flooding
topology, and
5) some nodes resynchronize their LSDBs with their neighbors.
The time for the leader to compute a new flooding topology depends on the
complexity of the algorithm. For computing a flooding topology from any network
topology that survives any one failure, the algorithm is very complex in term
of time in general.
Tree topologies are particularly vulnerable to failures since each and every
link failure creates a partition of the flooding topology.
Repairing this in a distributed fashion is also not perceptibly better. After
a partition, it will require at least one LSP propagation time to detect the
failure. Then execution of the algorithm, followed by another LSP propagation
time before a complete SPF can be run.
[HC] Tree topologies we mentioned are just for examples.
> "draft-cc-lsr-flooding-reduction" has a light-weight method for providing
> fault tolerance to multiple failures on the flooding topology. This method
> makes sure that the network still converges fast when the multiple failures
> happen. Thus the failures have almost minimal impact on the network
> convergence in the network with the flooding reduction deployment. The
> customers' traffic lose is significantly reduced (or almost minimized) when
> the failures occur in the network with the flooding reduction.
Your proposal is to revert to almost full flooding. You bound the flooding to
three hops (maybe, this is not defined), but in a LS topology, that is the full
topology. This is not a solution. If we could support full flooding, we would
not have problem in the first place.
[HC] This is one optional solution. When multiple failures on the flooding
topology happen, the nodes around the failure points revert to their normal (or
full) flooding temporarily until a new flooding topology is built. This normal
(or full) flooding on some nodes is only temporary for the emergency where
multiple failures occur. After the new flooding topology is built on these
nodes, they revert to flooding to the links on the flooding topology.
There is another solution in our draft, which is a light-weight method for
providing fault tolerance to multiple failures on the flooding topology.
> "draft-li-lsr-dynamic-flooding" just mentions that the edges of split
> parts will do something after the split of the flooding topology happens.
> However, it needs time to determine if a split of the flooding topology
> occurs and to find backup paths to connect split parts. There is no
> description on how to determine a split or find backup paths. This is not
> useful for implementors or users.
It is described. Please see section 6.7.11. After a topology change, all nodes
can examine the existing flooding topology and the faults that have occurred.
If it is clear that the flooding topology has partitioned, then nodes that have
adjacencies that are not reachable on the flooding topology make a temporary
request to their adjacencies to add to the flooding topology. This creates a
temporary repair until the next flooding topology is distributed.
[HC] From both your draft and your description above, we can not see any
details or description on how to determine a node has adjacencies that are not
reachable on the flooding topology, and how does the node make a temporary
request to their adjacencies to add to the flooding topology.
If the TLV with R bit in IS-IS Hello or FR-bit in OSPF Hello is used to request
for adding a link to flooding topology temporarily (or say enabling temporary
flooding over the link), then there is a big issue. For example, if the first
Hello packet containing the TLV with R bit in IS-IS (or FR-bit in OSPF) gets
lost, then the request to add/enable “temporary flooding” on the link will take
more than a Hello interval (typically 10 seconds in OSPF). This is too long to
accept!
> In addition, the algorithm for computing a flooding topology that can survive
> any one link failure on the flooding topology (i.e., any one link failure on
> the flooding topology does not split the flooding topology) will be very
> complex, at least in terms of time. With a good method for fault tolerance,
> the algorithm is not required to compute a flooding topology that can survive
> any one link failure on the flooding topology.
Collapsing the network is not a 'good method'.
[HC] The above paragraph just was proposed as an optional solution, which works
and reduces the complexity of the algorithm for computing a flooding topology,
but is not mandatory for implementation.
> 3. A New Node and Link Addition
>
> Two different procedures for handling a new node and link addition to the
> topology are described in the two drafts ("draft-li-lsr-dynamic-flooding" and
> "draft-cc-lsr-flooding-reduction").
>
> The procedure in "draft-cc-lsr-flooding-reduction": After the new node (say
> node A) and an existing node (say node B) establishes the adjacency over the
> link, A and B add the link to the flooding topology temporarily until a new
> flooding topology is built.
If A and B are already on the flooding topology, then this is redundant and
harmful.
[HC] It is impossible that a new node A and a new link are on the flooding
topology already.
> The procedure in "draft-li-lsr-dynamic-flooding": A new TLV with R bit in IIH
> (IS-IS Hello), and FR-bit in OSPF Hello LLS Extended Options are defined. The
> new node (say node A) and an existing node (say node B), send each other the
> TLV with R bit in IS-IS Hello or FR-bit in OSPF Hello to request for adding
> the link to flooding topology temporarily (or say enabling temporary flooding
> over the link).
This only happens if the other node does not appear on the flooding topology.
[HC] The case that we talked is “A New Node and Link Addition” from the title
above.
The new node is not on the flooding topology. This case applies to your
procedure.
> The procedure in "draft-cc-lsr-flooding-reduction" is simpler and more
> efficient. We can see that the two procedures try to achieve the same result
> ("
> temporary flooding" on the link) in different approaches. When a new
> node a nd link is added to the topology/network, it is deterministic
> that the link needs to be added to the flooding topology temporarily
> by the two end node s of the link. There is no need for the end nodes
> to add/enable "temporary flooding" on the link through (new TLV in IIH
> or FR-Bit in OSPF Hello LLS) signaling over the link.
Automatically adding to the flooding topology risks collapsing the network,
again contradicting the entire goal of reducing the flooding topology.
[HC] The results produced by the two procedures in the two drafts are the same.
The differences between the two procedures is that one procedure is simpler and
more efficient than the other one.
There is no risk collapsing the network. It does not contradict the entire goal
of reducing the flooding topology.
The new link is added to the flooding topology “temporarily” by the two end
nodes of the link locally.
This link “temporarily” on the flooding topology is not flooded to the other
nodes.
It is just for link states to be flooded “temporarily” over this link before a
new flooding topology is built.
> During internal review of "draft-li-lsr-dynamic-flooding" we observed
> significant procedure issues, which are big issues for implementations
> and users. For example in one issue, if the first Hello packet
> containing the TLV with R bit in IS-IS (or FR-bit in OSPF) gets lost,
> then to add/enable "temporary flooding" on the link will take more
> than a Hello interval (typically
> 10 seconds in OSPF). This is too long to accept!
Loss of a hello also implies that the adjacency will be delayed in coming up,
so this is the same in both situations.
[HC] There is a case mentioned above, which is only for your draft. The case is
under title “2. Fault tolerance to multiple failures”.
> 4. An Existing Link Addition
>
> The approach in "draft-cc-lsr-flooding-reduction": The end node sends
> its adjacent node over the adjacency the link states with changes it received
> or originated during the period of time in which the flooding topology may
> split.
This requires the implementation to retain state about every LSP on every link,
even when it is down.
[HC] There is no such requirement in our approach.
Our approach defines a variable TimeOf1stLinkDown, which stores the time at
time the first link on the flooding topology is down after a new flooding
topology is built.
It uses the time information of an LSA/LSP in LSDB, the time at which a changed
LSA/LSP is received or originated. If a current OSPF/IS-IS implementation does
not have this time information, it can be easily added.
When adding an existing link/adjacency not on the flooding topology between two
end nodes to the flooding topology after building another new topology, each
end node of the link sends its adjacent node over the adjacency the changed
LSA/LSPs from TimeOf1stLinkDown if there is a link down (i.e.,
TimeOf1stLinkDown != 0).
As we explained, this approach is more efficient.
> The approach in "draft-li-lsr-dynamic-flooding": A full LSDB
> resynchronization is requested over the existing adjacency/link between two
> end nodes for possible flooding topology split which may cause LSDB out of
> synchronization.
Resynchronization is a normal part of the protocol process at link up anyway.
[HC] This case is not about link up. It is about adding an existing
link/adjacency not on the flooding topology between two end nodes to the
flooding topology. The following is the original text for your reference:
“Two different approaches for adding an existing link/adjacency not on the
flooding topology between two end nodes to the flooding topology are described
in the two drafts (“draft-li-lsr-dynamic-flooding” and
“draft-cc-lsr-flooding-reduction”).”
Even though the adjacency between two end nodes is already there and their
LSDBs are the same in most of situations, your approach still requests for a
full LSDB resynchronization, which consumes (or wastes) the power for
processing IGP messages and the link bandwidth for transferring IGP messages.
This is contradicting with the entire goal of the flooding reduction.
Regards,
Tony
_______________________________________________
Lsr mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/lsr