From: Tony Przygienda <[email protected]> Sent: Friday, July 18, 2025 1:25 PM To: Les Ginsberg (ginsberg) <[email protected]> Cc: lsr <[email protected]> Subject: Re: [Lsr] Fwd: Re: Review comments for draft-prz-lsr-hierarchical-snps-00: High Level Concerns
On Fri, Jul 18, 2025 at 8:16 PM Les Ginsberg (ginsberg) <[email protected]<mailto:[email protected]>> wrote: [LES:] Let’s use a very simple example. A and B are neighbors For LSPs originated by Node C here is the current state of the LSPDB: A has (C.00-00(Seq 10), C.00-01(Seq 8), C-00.02(Seq 7) Merkle hash: 0xABCD B has (C.00-00(Seq 10), C.00-01(Seq 9), C-00.02(Seq 6) Merkle hash: 0xABCD (unlikely that the hashes match - but possible) When A and B exchange hash TLVs they will think they have the same set of LSPs originated by C even though they don’t. They would clear any SRM bits currently set to send updated LSPs received from C on the interface connecting A-B. We have just broken the reliability of the update process. The analogy of the use of fletcher checksum on PDU contents is not a good one. The checksum allows a receiver to determine whether any bit errors occurred in the transmission. If a bit error occurs and is undetected by the checksum, that is bad – but it just means that a few bits in the data are wrong – not that we are missing the entire LSP. I appreciate there is no magic here – but I think we can easily agree that improving scalability at the expense of reliability is not a tradeoff we can accept. well, we already have this problem today as I described, the more stuff the hash/checksum covers the more likely it becomes of course that caches collide. only way to be better here is to distribute bigger or more caches/checksums. And shifted XORs are actually som,e of the best "entropy generators" based on work done on MAC hashes for SPT AFAIR [LES2:] We don’t have the same problem today. SNP entry (as you documented) has: (LSP ID. Fragment + Seq# + CSUM + Lifetime) If I have: A.00-00 (Seq #10) Chksum: 0xABC You have: A.00-00 (Seq #11) Chksum: 0xABC checksum is just a funky hash. if something reboots and generates same fragment with same seq# and generates same checksum for different content (which is possible) problem is architecturally the same, flooding will not request it. yes, it affects a single LSP rather than a bunch like HSNP but then again, it;s 16 bits, the draft for HSNP went to 32 bits built from 64 bits so _really_ wide to push such likelihood far down. collision likelihood can be calculated assuming some distributions etc but will be such a small number IME it is meaningless just like for the 2 byte CSUM on a single LSP. point being, with CSUM/hash to represent much larger amount of bytes you will never find a _perfect_ solution that is collision free. rest are probabilities [LES3:] Tony – it’s a matter of consequences. Currently, a checksum that is valid/identical for different data sets impacts only a small number of LSPs – typically one – because we still have the complete description of each LSP in the CSNP. With the hash TLV, a checksum that is valid but represents a different set of LSPS potentially affects reliable flooding of every LSP in the range . And what you have proposed could support a single range for the entire LSPDB. Agreed, no hash is perfect. But given the consequences it argues for: * A robust hash * Limitation on the hash degree The latter could be done by limiting the hash range to per node – rather than to allow a range of many nodes. I am motivated to be cautious here. Les --- tony
_______________________________________________ Lsr mailing list -- [email protected] To unsubscribe send an email to [email protected]
