jmalkin commented on issue #647:
URL: 
https://github.com/apache/datasketches-java/issues/647#issuecomment-2779331807

   I used the code samples from the comparison page in #298 to test the various 
algorithms, thanks to @richardstartin. Rather than drawing from a distribution 
and plotting, I went with a direct comparison of randomness. We expect a 
uniformly random distribution so I just used sequential integers for the sample 
and tracked how often I saw each one across many trials. Then I computed the 
entropy of the resulting counts.
   ```
   k = 5, n = 2048, trials = 10M
   Entropy:
           Reference: 11.0
           Algorithm R: 10.999969901903189
           Algorithm L: 10.999894443713615
           Algorithm X: 10.999311172690062
           Algorithm Z: 10.995795765496663
   ```
   ```
   k = 5, n = 1024, trials = 10M
   Entropy:
           Reference: 10.0
           Algorithm R: 9.999984679575581
           Algorithm L: 9.999836811505581
           Algorithm X: 9.998684771374391
           Algorithm Z: 9.991644691418209
   ```
   ```
   k = 10, n = 128, trials = 10M:
   Entropy:
           Reference: 7.0
           Algorithm R: 6.999999117283841
           Algorithm L: 6.999457302895078
           Algorithm X: 6.982288311480539
   ```
   ```
   k = 5, n = 32, trials = 10M:
   Entropy:
           Reference: 5.0
           Algorithm R: 4.999999735882753
           Algorithm L: 4.995728053697453
           Algorithm X: 4.964960870537897
   ```
   Algorithm Z had issues with smaller sizes where some items were never 
selected. When I remembered to screen out zero counts so my entropy wasn't NaN, 
it did at least run But entropy was worse. I could re-run them I suppose but 
... eh.
   
   So ultimately the other algorithms are faster but they're not quite as 
uniformly random. That discrepancy does seem to diminish with size but doesn't 
entirely go away.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to