On Thu, 1 Jul 2021 at 22:18, Fabien COELHO <coe...@cri.ensmp.fr> wrote: > > > However, this patch seems to be replacing that with a simple > > modulo operation, which is perhaps the worst possible way to do it. > > The modulo operation is biased for large ranges close to the limit, sure. > Also, the bias is somehow of the same magnitude as the FP multiplication > approach used previously, so the "worst" has not changed much, it is > really the same as before. >
It may be true that the bias is of the same magnitude as FP multiply, but it is not of the same nature. With FP multiply, the more-likely-to-be-chosen values are more-or-less evenly distributed across the range, whereas modulo concentrates them all at one end, making it more likely to bias test results. It's worth paying attention to how other languages/libraries implement this, and basically no one chooses the modulo method, which ought to raise alarm bells. Of the biased methods, it has the worst kind of bias and the worst performance. If a biased method is OK, then the biased integer multiply method seems to be the clear winner. This requires the high part of a 64x64-bit product, which is trivial if 128-bit integers are available, but would need a little more work otherwise. There's some code in common/d2s that might be suitable. Most other implementations tend to use an unbiased method though, and I think it's worth doing the same. It might be a bit slower, or even faster depending on implementation and platform, but in the context of the DB as a whole, I don't think a few extra cycles matters either way. The method recommended at the very end of that blog seems to be pretty good all round, but any of the other commonly used unbiased methods would probably be OK too. Regards, Dean