[Lsr] Re: draft-prz-lsr-hierarchical-snps

Tony Przygienda Sat, 18 Oct 2025 02:46:13 -0700

yeah, I read you and it's an intriguing and helpful idea.

Simplest form of that would be "roll every 3 times on HSNPs to a new random
seed" which would be advertised in the HSNP header itself.


viable, heavy on computation and caching

like I said, 24 bits hash could be a cheaper cpu solution at cost of
smaller coverage due to packet size


On Thu, Oct 9, 2025 at 3:28 PM <[email protected]> wrote:

> Tony,
>
>
>
> Thanks for the reply and for elaborating.
>
>
>
>    1.
>
> By “per neighbor” I meant per adjacency.
>
> I had though about just using a different seed, but I could not convince
> myself that this would help with collision. XOR being commutative and
> associative, it looks like one could equally applies the seed at the end
> (last xor) hence after the collision. That’s why I had proposed something a
> bit heavier with a slightly different hash function. But I would not
> pretend being capable of following you.
>
>
>
> 2.
>
> Thanks for doing the math (for me) as the actual number(probability) does
> matter.
>
> Not being familiar with Rust, I’ll wait for the next revision of the draft.
>
>
>
> That being said, beyond xor, I’m not familiar with hash function and their
> properties.
>
> My point was limited to saying that may be using spatial diversity (my
> different adjacencies) could help with collision (if we can manage to have
> different collisions depending on the adjacency.)
>
>
>
> --Bruno
>
>
>
> *From:* Tony Przygienda <[email protected]>
> *Sent:* Wednesday, October 8, 2025 9:31 PM
> *To:* DECRAENE Bruno INNOV/NET <[email protected]>
> *Cc:* lsr <[email protected]>
> *Subject:* Re: [Lsr] draft-prz-lsr-hierarchical-snps
>
>
>
> Bruno, thanks.
>
>
>
> Multiple things
>
>
>
> 1. I am not sure what "per neighbor" here means. We could take some random
> number, advertise that in the hellos and use that as seed. all good except
> hellos are unreliable. Also, things like "take neighbors address" are
> difficult to do due to unnumbered links and so on. Ultimately such a seed,
> unless it varies over time reliably, is just another constant in hashing
> and I doubt it will improve any probabilities. But, yes, within a system we
> _could_ vary some constant used to build the hashes regularly (kind of like
> nonce) and advertise this constant in the HSNP itself. computationally
> expensive for the neighbor to compare and even if the numbers change slowly
> we need to cache per neighbor the whole thing (since each neighbor uses a
> different seed) but yes, it would vastly lower the probability of a "hash
> clash" of course.
>
> 2. To give the math quickly once again. I started with 64 bits (and we
> could do it) but things were big in terms of packet sizes and I realized
> with math it's overkill AFAIS. Today fletcher (AFAIR ISIS uses a variant of
> that) is 16 bits so assuming uniform distribution and independent variables
> we could say that a HSNP checksum over 4E9 (2^32) fragments statistically
> has the same "strength" as fletcher over a single fragment. Generally in
> 1st level HSNP we'll cover about 80 ranges each over max. 60 fragments or
> so (if I remember the numbers in the draft correctly) so probability is
> roughly 24E2 / 4E9 of that, 2nd level each range can pack a full packet of
> L1 ranges so we can say 80 * 80 * 60 fragments which is 38E4 / 4E9. so in
> plain numbers we are talking in very coarse terms of 0.0000001 and 0.00001
> chance of things clashing. Once we move to layer 3 we are in realm of 30E6
> fragments database, not something I consider likely but for the sake of
> argument that's 3E7/4E9 and then we're in 0.001 magnitude, i.e. there is a
> chance of about 0.1% that two databases containing 30 million fragments end
> up producing somewhere the same hash for two fragments. Of course it
> matters a _lot_ where this same hash is produced. If it's in the same L1
> range then XOR'ing both of them will basically remove both fragments. If
> not in the same range it's basically meaningless since range above it will
> not see the clash anymore. All gets mathematically hairy to analyze and I
> leave it at that. So if with all this people still think that all
> necessitates solution 1 with a new random number every couple iterations,
> that's doable, we can also go to 24 bit HSNP hashes easily.
>
>
>
> For the not so faint of heart, here's the fast double hashing of a
> fragment that seems to work very well producing good hamming distances over
> realistic scenarios and should make it into the next version of draft. I
> skip lots of definitions and just show the core Rust snippet. Anyone with
> some good foundation in statistics/hashing theory is more than welcome to
> criticize. We are not bound by self inverse hashing BTW for building hash
> of a fragment. The self-inversion (i.e. XOR) is only of interest later to
> allow for quick adjustment of caches over very similar sets.
>
>
>
> Distributions of hamming produce weighted averages of about 14 bits
> difference for very low entropy checksum (i.e. checksums vary by 1 or 2
> bits only for the tested fragments) and about 17 bits of the 32 bits with
> checksum being a uniform random number generated for each fragment used. So
> it looks like we're having enough entropy playing here.
>
>
>
> impl Into<HSNPHash> for (&SharedFragmentID, &FragmentContent) {
>     fn into(self) -> HSNPHash {
>         // 16 bit primes to fill in the 64 bits
>         let SEED1: u64 = 58417;
>         let SEED2: u64 = 47527;
>         const BYTEINBITS: usize = size_of::<u8>() * 8;
>         let (fragmentid, fragmentcontent) = self;
>
>         let nodeid_iterator = &fragmentid.node.node_id().0;
>
>         let foldednodeid1 = nodeid_iterator
>             .iter()
>             .fold(SEED1, |p, n| p.rotate_left(BYTEINBITS as _) ^ *n as
> u64);
>         let foldednodeid2 = nodeid_iterator
>             .iter()
>             .rev()
>             .fold(SEED2, |p, n| p.rotate_left(BYTEINBITS as _) ^ *n as
> u64);
>         // elements to rotate in with their byte size
>         let rotate_in = [
>             (
>                 fragmentcontent.checksum.0 as _,
>                 size_of_val(&fragmentcontent.checksum.0),
>             ),
>             (fragmentid.pnode.0 as u64, size_of_val(&fragmentid.pnode.0)),
>             (
>                 fragmentid.fragmentnr.0 as _,
>                 size_of_val(&fragmentid.fragmentnr.0),
>             ),
>             (
>                 fragmentcontent.seqnr.0 as _,
>                 size_of_val(&fragmentcontent.seqnr.0),
>             ),
>         ];
>
>         let extendedhash1 = rotate_in
>             .into_iter()
>             .fold(foldednodeid1, |p, (v, s)| p.rotate_left((s *
> BYTEINBITS) as _) ^ v);
>         let extendedhash2 = rotate_in
>             .into_iter()
>             .rev()
>             .fold(foldednodeid2, |p, (v, s)| p.rotate_left((s *
> BYTEINBITS) as _) ^ v);
>
>         (extendedhash1 ^ extendedhash2).into()
>     }
> }
>
>
>
>
>
>
>
> On Wed, Oct 8, 2025 at 4:02 PM <[email protected]> wrote:
>
> Hi all,
>
> [Thinking out loud, so I'm pretty sure I'll regret sending this email ...]
>
> With regards to hash collision, I'm wondering whether we could benefit
> from using a different hash (method/function) per neighbor, and/or over
> time. (As a priori it's enough for me to detect my LSDB inconsistency with
> a single neighbor)
> This requires more computation but assuming that a significant
> contribution to collision is the final step (from 64 bits to 32 bits),
> rotating a number of bits in one 32-bit int on a per neighbor basis seems
> relatively low cost.
>
> Regards,
> --Bruno
>
> ____________________________________________________________________________________________________________
> Ce message et ses pieces jointes peuvent contenir des informations
> confidentielles ou privilegiees et ne doivent donc
> pas etre diffuses, exploites ou copies sans autorisation. Si vous avez
> recu ce message par erreur, veuillez le signaler
> a l'expediteur et le detruire ainsi que les pieces jointes. Les messages
> electroniques etant susceptibles d'alteration,
> Orange decline toute responsabilite si ce message a ete altere, deforme ou
> falsifie. Merci.
>
> This message and its attachments may contain confidential or privileged
> information that may be protected by law;
> they should not be distributed, used or copied without authorisation.
> If you have received this email in error, please notify the sender and
> delete this message and its attachments.
> As emails may be altered, Orange is not liable for messages that have been
> modified, changed or falsified.
> Thank you.
>
> _______________________________________________
> Lsr mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
>
> ____________________________________________________________________________________________________________
> Ce message et ses pieces jointes peuvent contenir des informations 
> confidentielles ou privilegiees et ne doivent donc
> pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu 
> ce message par erreur, veuillez le signaler
> a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
> electroniques etant susceptibles d'alteration,
> Orange decline toute responsabilite si ce message a ete altere, deforme ou 
> falsifie. Merci.
>
> This message and its attachments may contain confidential or privileged 
> information that may be protected by law;
> they should not be distributed, used or copied without authorisation.
> If you have received this email in error, please notify the sender and delete 
> this message and its attachments.
> As emails may be altered, Orange is not liable for messages that have been 
> modified, changed or falsified.
> Thank you.
>
>

_______________________________________________
Lsr mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[Lsr] Re: draft-prz-lsr-hierarchical-snps

Reply via email to