yeah, I read you and it's an intriguing and helpful idea. Simplest form of that would be "roll every 3 times on HSNPs to a new random seed" which would be advertised in the HSNP header itself.
viable, heavy on computation and caching like I said, 24 bits hash could be a cheaper cpu solution at cost of smaller coverage due to packet size On Thu, Oct 9, 2025 at 3:28 PM <[email protected]> wrote: > Tony, > > > > Thanks for the reply and for elaborating. > > > > 1. > > By “per neighbor” I meant per adjacency. > > I had though about just using a different seed, but I could not convince > myself that this would help with collision. XOR being commutative and > associative, it looks like one could equally applies the seed at the end > (last xor) hence after the collision. That’s why I had proposed something a > bit heavier with a slightly different hash function. But I would not > pretend being capable of following you. > > > > 2. > > Thanks for doing the math (for me) as the actual number(probability) does > matter. > > Not being familiar with Rust, I’ll wait for the next revision of the draft. > > > > That being said, beyond xor, I’m not familiar with hash function and their > properties. > > My point was limited to saying that may be using spatial diversity (my > different adjacencies) could help with collision (if we can manage to have > different collisions depending on the adjacency.) > > > > --Bruno > > > > *From:* Tony Przygienda <[email protected]> > *Sent:* Wednesday, October 8, 2025 9:31 PM > *To:* DECRAENE Bruno INNOV/NET <[email protected]> > *Cc:* lsr <[email protected]> > *Subject:* Re: [Lsr] draft-prz-lsr-hierarchical-snps > > > > Bruno, thanks. > > > > Multiple things > > > > 1. I am not sure what "per neighbor" here means. We could take some random > number, advertise that in the hellos and use that as seed. all good except > hellos are unreliable. Also, things like "take neighbors address" are > difficult to do due to unnumbered links and so on. Ultimately such a seed, > unless it varies over time reliably, is just another constant in hashing > and I doubt it will improve any probabilities. But, yes, within a system we > _could_ vary some constant used to build the hashes regularly (kind of like > nonce) and advertise this constant in the HSNP itself. computationally > expensive for the neighbor to compare and even if the numbers change slowly > we need to cache per neighbor the whole thing (since each neighbor uses a > different seed) but yes, it would vastly lower the probability of a "hash > clash" of course. > > 2. To give the math quickly once again. I started with 64 bits (and we > could do it) but things were big in terms of packet sizes and I realized > with math it's overkill AFAIS. Today fletcher (AFAIR ISIS uses a variant of > that) is 16 bits so assuming uniform distribution and independent variables > we could say that a HSNP checksum over 4E9 (2^32) fragments statistically > has the same "strength" as fletcher over a single fragment. Generally in > 1st level HSNP we'll cover about 80 ranges each over max. 60 fragments or > so (if I remember the numbers in the draft correctly) so probability is > roughly 24E2 / 4E9 of that, 2nd level each range can pack a full packet of > L1 ranges so we can say 80 * 80 * 60 fragments which is 38E4 / 4E9. so in > plain numbers we are talking in very coarse terms of 0.0000001 and 0.00001 > chance of things clashing. Once we move to layer 3 we are in realm of 30E6 > fragments database, not something I consider likely but for the sake of > argument that's 3E7/4E9 and then we're in 0.001 magnitude, i.e. there is a > chance of about 0.1% that two databases containing 30 million fragments end > up producing somewhere the same hash for two fragments. Of course it > matters a _lot_ where this same hash is produced. If it's in the same L1 > range then XOR'ing both of them will basically remove both fragments. If > not in the same range it's basically meaningless since range above it will > not see the clash anymore. All gets mathematically hairy to analyze and I > leave it at that. So if with all this people still think that all > necessitates solution 1 with a new random number every couple iterations, > that's doable, we can also go to 24 bit HSNP hashes easily. > > > > For the not so faint of heart, here's the fast double hashing of a > fragment that seems to work very well producing good hamming distances over > realistic scenarios and should make it into the next version of draft. I > skip lots of definitions and just show the core Rust snippet. Anyone with > some good foundation in statistics/hashing theory is more than welcome to > criticize. We are not bound by self inverse hashing BTW for building hash > of a fragment. The self-inversion (i.e. XOR) is only of interest later to > allow for quick adjustment of caches over very similar sets. > > > > Distributions of hamming produce weighted averages of about 14 bits > difference for very low entropy checksum (i.e. checksums vary by 1 or 2 > bits only for the tested fragments) and about 17 bits of the 32 bits with > checksum being a uniform random number generated for each fragment used. So > it looks like we're having enough entropy playing here. > > > > impl Into<HSNPHash> for (&SharedFragmentID, &FragmentContent) { > fn into(self) -> HSNPHash { > // 16 bit primes to fill in the 64 bits > let SEED1: u64 = 58417; > let SEED2: u64 = 47527; > const BYTEINBITS: usize = size_of::<u8>() * 8; > let (fragmentid, fragmentcontent) = self; > > let nodeid_iterator = &fragmentid.node.node_id().0; > > let foldednodeid1 = nodeid_iterator > .iter() > .fold(SEED1, |p, n| p.rotate_left(BYTEINBITS as _) ^ *n as > u64); > let foldednodeid2 = nodeid_iterator > .iter() > .rev() > .fold(SEED2, |p, n| p.rotate_left(BYTEINBITS as _) ^ *n as > u64); > // elements to rotate in with their byte size > let rotate_in = [ > ( > fragmentcontent.checksum.0 as _, > size_of_val(&fragmentcontent.checksum.0), > ), > (fragmentid.pnode.0 as u64, size_of_val(&fragmentid.pnode.0)), > ( > fragmentid.fragmentnr.0 as _, > size_of_val(&fragmentid.fragmentnr.0), > ), > ( > fragmentcontent.seqnr.0 as _, > size_of_val(&fragmentcontent.seqnr.0), > ), > ]; > > let extendedhash1 = rotate_in > .into_iter() > .fold(foldednodeid1, |p, (v, s)| p.rotate_left((s * > BYTEINBITS) as _) ^ v); > let extendedhash2 = rotate_in > .into_iter() > .rev() > .fold(foldednodeid2, |p, (v, s)| p.rotate_left((s * > BYTEINBITS) as _) ^ v); > > (extendedhash1 ^ extendedhash2).into() > } > } > > > > > > > > On Wed, Oct 8, 2025 at 4:02 PM <[email protected]> wrote: > > Hi all, > > [Thinking out loud, so I'm pretty sure I'll regret sending this email ...] > > With regards to hash collision, I'm wondering whether we could benefit > from using a different hash (method/function) per neighbor, and/or over > time. (As a priori it's enough for me to detect my LSDB inconsistency with > a single neighbor) > This requires more computation but assuming that a significant > contribution to collision is the final step (from 64 bits to 32 bits), > rotating a number of bits in one 32-bit int on a per neighbor basis seems > relatively low cost. > > Regards, > --Bruno > > ____________________________________________________________________________________________________________ > Ce message et ses pieces jointes peuvent contenir des informations > confidentielles ou privilegiees et ne doivent donc > pas etre diffuses, exploites ou copies sans autorisation. Si vous avez > recu ce message par erreur, veuillez le signaler > a l'expediteur et le detruire ainsi que les pieces jointes. Les messages > electroniques etant susceptibles d'alteration, > Orange decline toute responsabilite si ce message a ete altere, deforme ou > falsifie. Merci. > > This message and its attachments may contain confidential or privileged > information that may be protected by law; > they should not be distributed, used or copied without authorisation. > If you have received this email in error, please notify the sender and > delete this message and its attachments. > As emails may be altered, Orange is not liable for messages that have been > modified, changed or falsified. > Thank you. > > _______________________________________________ > Lsr mailing list -- [email protected] > To unsubscribe send an email to [email protected] > > ____________________________________________________________________________________________________________ > Ce message et ses pieces jointes peuvent contenir des informations > confidentielles ou privilegiees et ne doivent donc > pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu > ce message par erreur, veuillez le signaler > a l'expediteur et le detruire ainsi que les pieces jointes. Les messages > electroniques etant susceptibles d'alteration, > Orange decline toute responsabilite si ce message a ete altere, deforme ou > falsifie. Merci. > > This message and its attachments may contain confidential or privileged > information that may be protected by law; > they should not be distributed, used or copied without authorisation. > If you have received this email in error, please notify the sender and delete > this message and its attachments. > As emails may be altered, Orange is not liable for messages that have been > modified, changed or falsified. > Thank you. > >
_______________________________________________ Lsr mailing list -- [email protected] To unsubscribe send an email to [email protected]
