[Lsr] Re: Review comments for draft-prz-lsr-hierarchical-snps-00: High Level Concerns

Les Ginsberg (ginsberg) Fri, 18 Jul 2025 10:14:23 -0700

Tony –

Thanx for the quick response.
Please see inline.

From: Tony Li <[email protected]>
Sent: Friday, July 18, 2025 12:45 AM
To: Les Ginsberg (ginsberg) <[email protected]>
Cc: [email protected]; lsr <[email protected]>
Subject: Re: [Lsr] Review comments for draft-prz-lsr-hierarchical-snps-00: High 
Level Concerns

Hi Les,

1)The uniqueness of the calculated hash is an essential component for this to 
work. Given that you are using a simple XOR on a 64 bit number - and then 
"compressing" it to 32 bits for advertisement - uniqueness is NOT guaranteed. 
The danger of false positives (i.e., hashes that match when they should not) 
would compromise the solution. Can you provide more detail on the efficacy of 
the hash?

I’m sorry, you’re a bit confused here. We do NOT need uniqueness of the hash.  
In fact, one of the essential properties of all hashes is that they are not 
unique. Multiple inputs will always produce hash collisions.  This is 
necessarily true: the size of the input is larger than the size of the output. 
Information is necessarily lost.

This is already true for the Fletcher checksum that is used as part of CSNPs.

What we do want is to ensure that the hashing function is sensitive to the 
inputs. That is, for a small change in the input, there is a change in the hash 
value.

Since we are not doing security here, we do NOT care about the ability to 
compute a hash collision.

That said, I don’t think that we are particularly sensitive to the specific 
hashing function. My personal preference would be to continue to use the 
Fletcher checksum just because the code is already there in all 
implementations. One could also reasonably use CRC-16, CRC-32, etc.

[LES:] Let’s use a very simple example.

A and B are neighbors
For LSPs originated by Node C here is the current state of the LSPDB:

A has (C.00-00(Seq 10), C.00-01(Seq 8), C-00.02(Seq 7) Merkle hash: 0xABCD
B has (C.00-00(Seq 10), C.00-01(Seq 9), C-00.02(Seq 6) Merkle hash: 0xABCD
(unlikely that the hashes match -  but possible)

When A and B exchange hash TLVs they will think they have the same set of LSPs 
originated by C even though they don’t.
They would clear any SRM bits currently set to send updated LSPs received from 
C on the interface connecting A-B.
We have just broken the reliability of the update process.

The analogy of the use of fletcher checksum on PDU contents is not a good one. 
The checksum allows a receiver to determine whether any bit errors occurred in 
the transmission. If a bit error occurs and is undetected by the checksum, that 
is bad – but it just means that a few bits in the data are wrong – not that we 
are missing the entire LSP.

I appreciate there is no magic here – but I think we can easily agree that 
improving scalability at the expense of reliability is not a tradeoff we can 
accept.

2)Do we need a more sophisticated hash calculation in order to guarantee 
uniqueness? If the argument is the update process is already reliable even 
without CSNPs/HSNPs - that HSNPs are simply an optimization and don't have to 
be 100% reliable, then I think this implies that periodic CSNPs are not needed 
at all. And if the hash has a significant possibility of being non-unique, 
relying on HSNPs during adjacency bringup might actually be a hindrance, not a 
help.

Periodic CSNPs are not needed.  A periodic HSNP is sufficient, and if there are 
inconsistencies, then they will devolve into CSNPs to isolate the exact portion 
of the database that is inconsistent.  We intentionally re-use the CSNP and 
PSNP mechanisms as we saw no point in re-inventing them.
[LES:] My argument is that periodic xSNPs (be that CSNPs or HSNPs) may not be 
needed at all.

3)I would like to raise the question as to whether we should prioritize a 
solution that aids initial LSPDB sync on adjacency bringup over a solution 
which works well after LSPDB synchronization (periodic CSNPs).

Our solution works well in both cases.  In the case of initial bringup, our 
mechanism exchanges a logarithmic number of packets to isolate the exact LSPs 
that are inconsistent.  In the case where databases are already synchronized, 
this means that only a single top-level HSNP is required.

This is also true in the case of continuing verification of synchronized 
databases.

[LES:] The solution you have proposed works much better when the LSPDBs on the 
neighbors are “almost the same” because the ranges of LSPs covered in each hash 
are more likely to be the same.
At adjacency bringup this is less likely to be the case – meaning that every 
time I receive an HSNP from you I am more likely to need to calculate the hash 
the way you did rather than simply check a cached hash value.
(BTW – the use of cached hash values is mentioned in the draft as desirable – I 
did not invent this goal. 😊)
One way of improving this is to limit the hash TLV to LSPs from a single node 
(no range required).
This improves xSNP scalability from per LSP to per node.

The need for periodic CSNPs arose from early attempts at flooding optimizations 
(mesh groups) where an error in the manual configuration could jeopardize the 
reliability of the Update Process. In deployments where standards based 
flooding optimizations are used, the need for periodic CSNPs is lessened as the 
standards based solution should be well tested. Periodic CSNPs becomes the 
"suspenders" in a "belt" based deployment (or if you prefer the "belt" in a 
"suspenders" based deployment). I am wondering if we should deemphasize the use 
of periodic CSNPs?  In any case, the size of a full CSNP set is a practical 
issue in scale deployments - especially where a node has a large number of 
neighbors. Sending the full CSNP set on adjacency UP is a necessary step and 
therefore I would like to see this use case get greater attention over the 
optional periodic CSNP case.

SInce this now reduces to sending a single top level HSNP, and I like having a 
belt and suspenders (figuratively), things are already much cheaper and I would 
favor retaining that.

4)You choose to define new PDUs - which is certainly a viable option. But I am 
wondering if you considered simply defining a new TLV to be included in 
existing xSNPs. I can imagine cases - especially in PSNP usage - where a 
mixture of existing LSP entries and new Merkle Hash entries could usefully be 
sent in a PSNP to request/ack LSPs as we do today. The use of the hash TLV in 
PSNPs could add some efficiency to LSP acknowledgments.

We chose to go to new PDUs to not risk interoperability problems. We could 
easily see outselves wanting to generate packets that only include HSNP 
information and no legacy CSNP/PSNP information.
[LES:] I am cautious about new PDUs because it translates into new PDUs/level 
and – somewhere down the road – new PDUs to support new scopes (RFC 7356). (The 
256 LSP limit per node is another limitation that we may yet have to deal with.)
Given we are already negotiating the use of the new TLV/neighbor – and that in 
IS-IS unsupported TLVs are always ignored – I don’t see that the new TLV 
approach is more risky.

5)The choice of ranges for the new TLVs depends upon the current state of the 
LSPDB on the sending node. The definitions you have seem targeted at "periodic 
CSNPs" where it is reasonable to expect that both neighbors have (nearly) the 
same LSPDB contents. However, in the case of adjacency bringup, it is likely 
that there are significant differences in the current content of the LSPDBs on 
the neighbor - which will make it far more likely that the ranges of nodes 
chosen in each hash entry will differ between the neighbors - making the 
strategy less useful for this case.

I don’t see anything ‘less useful’ about this case. If there are discrepancies, 
then they are resolved in an efficient manner. Any subsets of the database that 
are in sync are very efficiently confirmed by higher layers.

6)You do not discuss the use of HSNPs on LANs. It would seem intuitive that 
HSNPs could only be used when all neighbors on the LAN support it. But some 
discussion of LANs would be desirable.

Agreed.  Given the decreasing usage of actual LAN situations, I think that this 
is not a significant concern.
[LES:] Agreed – but for completeness it should be discussed.

   Les

T

_______________________________________________
Lsr mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[Lsr] Re: Review comments for draft-prz-lsr-hierarchical-snps-00: High Level Concerns

Reply via email to