Hi

On 11/5/22 16:34, Go Kudo wrote:
I am skeptical only about getFloat(). The use cases are limited and seem
somewhat excessive. Do you have examples of how this is supported in other
languages?

Yes, unfortunately getFloat() became pretty complex, but that is because "generating random floats correctly" is pretty complicated, due to how floats work.

The getFloat() method as proposed implements the γ-section algorithm as published in: Drawing Random Floating-Point Numbers from an Interval. Frédéric Goualard, ACM Trans. Model. Comput. Simul., 32:3, 2022. https://doi.org/10.1145/3503512

This publication is just 7 months old and explains how the implementation in every other programming language is broken in one way or another and proposes the γ-section algorithm as a not-broken algorithm.

As floats are not uniformly dense and do not allow representing all values, it is very easy to introduce a bias or generate incorrect values.

An example taken from the publication:

php > $r = new Random\Randomizer();

We generate a random float in [0, 1) (allowing 0, but not 1), by dividing a random int between 2^53 - 1 by 2^53. This is effectively what ->nextFloat() does. This creates a uniformly distributed float with as many different values as possible, because a double (the underlying representation) has 53 bits of precision.

The nextFloat() method is often the only thing that is available in other languages, e.g. JavaScript with Math.random() [1]

php > $f = $r->getInt(0, (2**53 - 1)) / (2**53);
php > var_dump($f);
float(0.6942225382038698)

Now we want to turn this into a random float between [3.5, 4.5) (not allowing 4.5), because that's what we need. It's also the formula given in MDN for JavaScript's Math.random():

php > $min = 3.5;
php > $max = 4.5;
php > var_dump($min + ($max - $min) * $f);
float(4.19422253820387)

The simple formula appears to do the correct thing and it would be correct if floats could represent all value values. But what happens if the random integer is 2^53 - 1 (i.e. the maximum integer we allowed to generate)?

php > $f = (2**53 - 1) / (2**53);
php > var_dump($f);
float(0.9999999999999999)
php > var_dump($min + ($max - $min) * $f);
float(4.5)

In this case the result was rounded to 4.5, because the exact result was not representable. Now an invalid value was generated!

Likewise if you generate a random float between 0 and 1000 with this method, some values will appear more often than others due to rounding and the changing density of floats for each power of two.

With the γ-section algorithm by Prof. Goualard all these issues are eliminated and that's what is used for getFloat(). The getFloat() method supports all 4 possible boundary combinations to ensure that users have a safe solution for all possible use cases, so that they don't need to build an unsafe solution in userland.

Best regards
Tim Düsterhus

[1] https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Math/random

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Reply via email to