On Thu, Aug 17, 2017 at 12:02 AM, Solar Designer <so...@openwall.com> wrote:
> On Wed, Aug 16, 2017 at 11:41:55PM +0200, Solar Designer wrote: > > On Wed, Aug 16, 2017 at 10:06:02PM +0200, Nikita Popov wrote: > > > I'd suggest to split the 32-bit and 64-bit code codepaths entirely, as > the > > > interleaved #ifs are somewhat hard to follow. Something like > > > https://gist.github.com/nikic/64e7ec58ebb6121d350fb80927a65082 (not > > > thoroughly tested). > > > > This looks good to me - especially how you reduced the nesting of if's > > by special-casing the "Powers of two are not biased" return. With this > > change, you can as well drop the "Special case where no modulus is > > required", as it'd happen to be handled the same by your new return. > > OTOH, that optimization might require an extra comment on its own. > > > > Here's what this might look like (totally untested): > > > > https://gist.github.com/solardiz/5e3d313bbee2c1ce6e200e433b750bef > > One difference I didn't notice at first: the currently committed code > does only one php_mt_rand() call per loop iteration when it's skipping > "numbers over the limit", even in the 64-bit case (thus, it replaces 32 > bits of the value at a time). Your 64-bit version (and my revision of > it) does two calls per iteration (replacing the whole number). > > Arguably, the currently committed code is smarter and more efficient in > that respect, whereas yours is cleaner. Regardless, it's an extra > change that will affect the generated random numbers in some cases where > we could avoid making that change (it's not fixing any bug). > > I think it's preferable not to unnecessarily change what numbers are > generated. > > Alexander > Good point. However, I think that this optimization is not actually correct. For example, let's take umax = 0x1_0000_0001, which is the smallest value for which this codepath is taken. In this case limit would be 0xffff_ffff_ffff_fffe, so we only resample if the value is exactly 0xffff_ffff_ffff_ffff. So if the resampling codepath is taken in this case, and we only generate one new 32-bit value, the top word of the result will be fixed to 0xffff_ffff. (A very small bias in this case, but there's probably more significant cases.) Nikita