jmalkin commented on issue #647:
URL:
https://github.com/apache/datasketches-java/issues/647#issuecomment-2779331807
I used the code samples from the comparison page in #298 to test the various
algorithms, thanks to @richardstartin. Rather than drawing from a distribution
and plotting, I went with a direct comparison of randomness. We expect a
uniformly random distribution so I just used sequential integers for the sample
and tracked how often I saw each one across many trials. Then I computed the
entropy of the resulting counts.
```
k = 5, n = 2048, trials = 10M
Entropy:
Reference: 11.0
Algorithm R: 10.999969901903189
Algorithm L: 10.999894443713615
Algorithm X: 10.999311172690062
Algorithm Z: 10.995795765496663
```
```
k = 5, n = 1024, trials = 10M
Entropy:
Reference: 10.0
Algorithm R: 9.999984679575581
Algorithm L: 9.999836811505581
Algorithm X: 9.998684771374391
Algorithm Z: 9.991644691418209
```
```
k = 10, n = 128, trials = 10M:
Entropy:
Reference: 7.0
Algorithm R: 6.999999117283841
Algorithm L: 6.999457302895078
Algorithm X: 6.982288311480539
```
```
k = 5, n = 32, trials = 10M:
Entropy:
Reference: 5.0
Algorithm R: 4.999999735882753
Algorithm L: 4.995728053697453
Algorithm X: 4.964960870537897
```
Algorithm Z had issues with smaller sizes where some items were never
selected. When I remembered to screen out zero counts so my entropy wasn't NaN,
it did at least run But entropy was worse. I could re-run them I suppose but
... eh.
So ultimately the other algorithms are faster but they're not quite as
uniformly random. That discrepancy does seem to diminish with size but doesn't
entirely go away.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]