Hi Huaimo,
> > I’m sorry that you don’t find it useful. Determining the split is trivial: > > when you receive an IIH, > > it has a system ID of the another system in it. If that other system is not > > currently part of the > > flooding topology, then it is quite clear that it is disconnected from the > > flooding topology. > > Repairing the split is done by enabling temporary flooding on the new link. > > For an adjacency between two nodes is up, the Hello packets exchanged between > them will not change node/system IDs in them. > How do you determine that other system is not currently part of the flooding > topology? The IIH includes the system ID. See ISO 10589 v2, section 9.7, field “source Id”. The local system will have a copy of the flooding topology and can easily see if the neighbor was present as of the last FT computation. If not, then it should be added (modulo rate limiting). The local system can also examine it’s own LSDB. If there is no LSP for the neighbor, then it would seem highly likely that there is a disconnect and the neighbor should again be added (modulo rate limiting). We are not requiring it, but a system could also do a more extensive computation and compare the links between itself and the neighbor by tracing the path in the FT and then confirming that each link is up in the LSDB. > > There is an issue here that we have not yet resolved, which is the rate > > that new links should be > > temporarily added to the flooding topology. Some believe that adding any > > new link is the > > correct thing to do as it minimizes the recovery time. Others feel that > > enabling too many links > > could cause a flooding collapse, so link addition should be highly > > constrained. We are still > > discussing this and invite the WG’s opinions. > > The issue is resolved by the solutions in draft-cc-lsr-flooding-reduction. > One solution is below, where the given distance can be adjusted/configured. > If we want every node to flood on all its links, we let the given > distance to a big number. If we want the nodes within 2 hops to a failure > to flood on all their links, we set the given distance to 2. > “In one way, when two or more failures on the current flooding > topology occur almost in the same time, each of the nodes within a > given distance (such as 3 hops) to a failure point, floods the link > state (LS) that it receives to all the links (except for the one from > which the LS is received) until a new flooding topology is built.” As we have discussed, this is not a solution. In fact, this is more dangerous than anything else that has been proposed and seems highly likely to trigger a cascade failure. You are enabling full flooding for many nodes. In dense topologies, even a radius of 3 is very high. For example, in a LS topology, a radius of 3 is sufficient to enable full flooding throughout the entire topology. If that were stable, we would not need Dynamic Flooding at all. > Another solution is just adding minimum links temporarily on the flooding > topology to repair the split flooding topology until a new flooding topology > is built. Agreed. Which links constitute the minimum? In a general topology, with arbitrary failures that are not distributed globally, how do we make a distributed decision about which links to enable? This is the problem that we are trying to solve. And we have no oracle to tell us The Right Answer. > The link can be enabled for “temporary flooding” by the node without using > any TLV or Hello with the TLV. There are cases where it is far easier for the neighbor to realize that it is disconnected than for the local system to realize that the neighbor is disconnected. Thus, it is easier to allow one system to request temporary addition. > The TLV in Hello packet just requests for adding “temporary flooding” on the > link. The other information is accessed by the node locally. The TLV in Hello > packet does not help for corner case. In the case where a node is rebooted, a > new link attached to a new node may apply. If the node that rebooted has 1000 interfaces, which interfaces should be temporarily added? Adding all of them is likely to trigger a cascade failure. The TLV allows us to signal which ones should be enabled. > >All adjacencies are a single hop in both IS-IS and OSPF. Yes, Hello packets > >may be lost. > >Fortunately, they are periodically transmitted, thus the next transmission > >will also contain the > > TLV. If IIH’s are getting lost at a significant rate, then the adjacency > > will not (and should not) > >come up. Thus, the request for temporary flooding will propagate to the > >neighbor in all cases > >that matter. > > It takes too long when Hello packet is lost. Repairing split flooding > topology needs to be fast. Fortunately, lost hello packets are a relatively rare occurrence. While repairing the flooding topology needs to be done expediently, attempting to do so and triggering a cascade failure of the network is counter-productive. Given this alternative, a bit of extra delay when adding a new system to the network, or trying to recover from multiple failures seems wise. Rushing and making things worse does not. The first priority must remain network stability. > > It does not mean that a user/operator configures/select an area leader. It > means that a user/operator configures other things such as indicating an > algorithm or selecting the centralized mode on the area leader. In an implementation, centralized mode and algorithm selection can be the defaults. In fact, in our implementation, the only required configuration is to enable dynamic flooding. Everything else is automatic. Regards, Tony
_______________________________________________ Lsr mailing list [email protected] https://www.ietf.org/mailman/listinfo/lsr
