Re: [wsjt-devel] wsprd hash collisions?

Black Michael via wsjt-devel Sat, 14 Aug 2021 06:50:46 -0700

So is the hash table cleared automagically or does the user need to do it?
It would seem one might want to clear it during "system" startup perhaps?
Mike W9MDB


 

    On Saturday, August 14, 2021, 08:05:30 AM CDT, Bill Somerville via 
wsjt-devel <[email protected]> wrote:  
 
  Mike, 
  there is no CRC in the WSPR protocol, details are here: 
  https://www.physics.princeton.edu/pulsar/K1JT/WSPR_2.0_User.pdf 
  Hash codes are using in WSPR Type 2 and Type 3 messages where the call or 
locator specified to send does not fit into the 28-bit or 15-bit respective 
source encodings. Hash codes are needed to carry call information between a 
pair of transmissions allowing the second to be decoded using a hash code 
lookup. 
  Hash values would be useless if they were not saved, the lookup relies on the 
relatively low probability of a hash collision if the hash table is maintained 
in a MRU (most recently used) ordering. Also wsprd is used stand-alone by 
several systems to decode each received period, without a file of hash codes it 
could never do a successful hash lookup! 
  Hash collision likelihood increases if the hash table is allowed to get too 
big, this is because a message with the hash code may be received having never 
received the first message with the matching call. It makes sense to clear the 
hash table file occasionally to avoid this happening too often. 
  73
 Bill
 G4WJS. 
  On 14/08/2021 13:40, Black Michael via wsjt-devel wrote:
  
   Doesn't WSPR also use the CRC in messages?  So it would be a combination of 
collision + valid CRC.  The 50/50 point for 32768 values is 214. Why does WSPR 
remember the hash value? 
  We do see bogus matches in FT8 modes and such -- not real often but every 
once in while a callsign hash will match a random decode....same 15 bit hash 
being used for that too. 
  Given the much lower WSPR counts I would expect "valid" collisions to be 
pretty rare. 
  Mike W9MDB 
      On Friday, August 13, 2021, 05:41:12 PM CDT, Phil Karn via wsjt-devel 
<[email protected]> wrote:  
  
   The hash function used in wspr is 15 bits wide, i.e., there can be
  32,768 values. This may seem like a lot, but the "birthday paradox" says
  that the probability of a collision grows faster than you might expect
  as the set size grows. It comes from the fact that you only need ~23
  people to have a 50% probability that two of them have the same birthday.
  
  A very rough approximation is that the probability of a collision is 1/2
  when the set size is equal to the square root of the hash size. For 15
  bits, that's about 180. My hashtable.txt for 40m currently has 353 entries.
  
  Has anyone seen a collision in practice? If one occurs, the most recent
  duplicate entry is most likely the correct one. Requiring a match in the
  first 4 characters of the grid square would also seem to greatly reduce
  the problem.
  
  Phil
  
     
 

 
 _______________________________________________
wsjt-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/wsjt-devel

_______________________________________________
wsjt-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/wsjt-devel

Re: [wsjt-devel] wsprd hash collisions?

Reply via email to