Hi
On 11/5/22 16:34, Go Kudo wrote:
I am skeptical only about getFloat(). The use cases are limited and seem
somewhat excessive. Do you have examples of how this is supported in other
languages?
Yes, unfortunately getFloat() became pretty complex, but that is because
"generating random floats correctly" is pretty complicated, due to how
floats work.
The getFloat() method as proposed implements the γ-section algorithm as
published in: Drawing Random Floating-Point Numbers from an Interval.
Frédéric Goualard, ACM Trans. Model. Comput. Simul., 32:3, 2022.
https://doi.org/10.1145/3503512
This publication is just 7 months old and explains how the
implementation in every other programming language is broken in one way
or another and proposes the γ-section algorithm as a not-broken algorithm.
As floats are not uniformly dense and do not allow representing all
values, it is very easy to introduce a bias or generate incorrect values.
An example taken from the publication:
php > $r = new Random\Randomizer();
We generate a random float in [0, 1) (allowing 0, but not 1), by
dividing a random int between 2^53 - 1 by 2^53. This is effectively what
->nextFloat() does. This creates a uniformly distributed float with as
many different values as possible, because a double (the underlying
representation) has 53 bits of precision.
The nextFloat() method is often the only thing that is available in
other languages, e.g. JavaScript with Math.random() [1]
php > $f = $r->getInt(0, (2**53 - 1)) / (2**53);
php > var_dump($f);
float(0.6942225382038698)
Now we want to turn this into a random float between [3.5, 4.5) (not
allowing 4.5), because that's what we need. It's also the formula given
in MDN for JavaScript's Math.random():
php > $min = 3.5;
php > $max = 4.5;
php > var_dump($min + ($max - $min) * $f);
float(4.19422253820387)
The simple formula appears to do the correct thing and it would be
correct if floats could represent all value values. But what happens if
the random integer is 2^53 - 1 (i.e. the maximum integer we allowed to
generate)?
php > $f = (2**53 - 1) / (2**53);
php > var_dump($f);
float(0.9999999999999999)
php > var_dump($min + ($max - $min) * $f);
float(4.5)
In this case the result was rounded to 4.5, because the exact result was
not representable. Now an invalid value was generated!
Likewise if you generate a random float between 0 and 1000 with this
method, some values will appear more often than others due to rounding
and the changing density of floats for each power of two.
With the γ-section algorithm by Prof. Goualard all these issues are
eliminated and that's what is used for getFloat(). The getFloat() method
supports all 4 possible boundary combinations to ensure that users have
a safe solution for all possible use cases, so that they don't need to
build an unsafe solution in userland.
Best regards
Tim Düsterhus
[1]
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Math/random
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php