Hello Dean,
It may be true that the bias is of the same magnitude as FP multiply, but it is not of the same nature. With FP multiply, the more-likely-to-be-chosen values are more-or-less evenly distributed across the range, whereas modulo concentrates them all at one end, making it more likely to bias test results.
Yes, that is true.
It's worth paying attention to how other languages/libraries implement this, and basically no one chooses the modulo method, which ought to raise alarm bells. Of the biased methods, it has the worst kind of bias and the worst performance.
Hmmm. That is not exactly how I interpreted the figures in the blog.
If a biased method is OK, then the biased integer multiply method seems to be the clear winner. This requires the high part of a 64x64-bit product, which is trivial if 128-bit integers are available, but would need a little more work otherwise. There's some code in common/d2s that might be suitable.
And yes, modulo is expensive. If we allow 128 bits integers operations, I would not choose this RNPG in the first place, I'd take PCG with a 128 bits state.
That does not change the discussion about bias, though.
Most other implementations tend to use an unbiased method though, and I think it's worth doing the same. It might be a bit slower, or even faster depending on implementation and platform, but in the context of the DB as a whole, I don't think a few extra cycles matters either way.
Ok ok ok, I surrender!
The method recommended at the very end of that blog seems to be pretty good all round, but any of the other commonly used unbiased methods would probably be OK too.
That does not address my other issues with the proposed methods, in particular the fact that the generated sequence is less deterministic, but I think I have a simple way around that. I'm hesitating to skip to the the bitmask method, and give up performance uniformity. I'll try to come up with something over the week-end, and also address Tom's comments in passing.
-- Fabien.