From: Tony Przygienda <[email protected]>
Sent: Friday, July 18, 2025 1:25 PM
To: Les Ginsberg (ginsberg) <[email protected]>
Cc: lsr <[email protected]>
Subject: Re: [Lsr] Fwd: Re: Review comments for 
draft-prz-lsr-hierarchical-snps-00: High Level Concerns



On Fri, Jul 18, 2025 at 8:16 PM Les Ginsberg (ginsberg) 
<[email protected]<mailto:[email protected]>> wrote:

[LES:] Let’s use a very simple example.

A and B are neighbors
For LSPs originated by Node C here is the current state of the LSPDB:

A has (C.00-00(Seq 10), C.00-01(Seq 8), C-00.02(Seq 7) Merkle hash: 0xABCD
B has (C.00-00(Seq 10), C.00-01(Seq 9), C-00.02(Seq 6) Merkle hash: 0xABCD
(unlikely that the hashes match -  but possible)

When A and B exchange hash TLVs they will think they have the same set of LSPs 
originated by C even though they don’t.
They would clear any SRM bits currently set to send updated LSPs received from 
C on the interface connecting A-B.
We have just broken the reliability of the update process.

The analogy of the use of fletcher checksum on PDU contents is not a good one. 
The checksum allows a receiver to determine whether any bit errors occurred in 
the transmission. If a bit error occurs and is undetected by the checksum, that 
is bad – but it just means that a few bits in the data are wrong – not that we 
are missing the entire LSP.

I appreciate there is no magic here – but I think we can easily agree that 
improving scalability at the expense of reliability is not a tradeoff we can 
accept.

well, we already have this problem today as I described, the more stuff the 
hash/checksum covers the more likely it becomes of course that caches collide. 
only way to be better here is to distribute bigger or more caches/checksums. 
And shifted XORs are actually som,e of the best "entropy generators" based on 
work done on MAC hashes for SPT AFAIR
[LES2:] We don’t have the same problem today.
SNP entry (as you documented) has: (LSP ID. Fragment + Seq# + CSUM + Lifetime)
If I have: A.00-00 (Seq #10) Chksum: 0xABC
You have: A.00-00 (Seq #11) Chksum: 0xABC


checksum is just a funky hash. if something reboots and generates same fragment 
with same seq# and generates same checksum for different content (which is 
possible) problem is architecturally the same, flooding will not request it. 
yes, it affects a single LSP rather than a bunch like HSNP but then again, it;s 
16 bits, the draft for HSNP went to 32 bits built from 64 bits so _really_ wide 
to push such likelihood far down.

collision likelihood can be calculated assuming some distributions etc but will 
be such a small number IME it is meaningless just like for the 2 byte CSUM on a 
single LSP.

point being, with CSUM/hash to represent much larger amount of bytes you will 
never find a _perfect_ solution that is collision free. rest are probabilities
[LES3:] Tony – it’s a matter of consequences.
Currently, a checksum that is valid/identical for different data sets impacts 
only a small number of LSPs – typically one – because we still have the 
complete description of each LSP in the CSNP.
With the hash TLV, a checksum that is valid but represents a different set of 
LSPS potentially affects reliable flooding of every LSP in the range . And what 
you have proposed could support a single range for the entire LSPDB.

Agreed, no hash is perfect. But given the consequences it argues for:


  *   A robust hash
  *   Limitation on the hash degree

The latter could be done by limiting the hash range to per node – rather than 
to allow a range of many nodes.
I am motivated to be cautious here.

   Les

--- tony

_______________________________________________
Lsr mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to