2022年2月15日(火) 1:46 Tim Düsterhus <t...@bastelstu.be>: > Hi > > On 2/14/22 16:44, Tim Düsterhus wrote: > > Unfortunately your PR doesn't compile for me, so I can't test: > > > > make: *** No rule to make target 'php-src/ext/standard/lcg.c', needed by > > 'ext/standard/lcg.lo'. Stop. > > I've managed to compile it by cleaning the whole directory and rerunning > of the build steps. Not sure what I missed the first time. > > I've now been able to play around with it and have some additional > discussion points: > > 1) Consider the following script: > > > <?php > > use Random\NumberGenerator\XorShift128Plus; > use Random\Randomizer; > > $g1 = new XorShift128Plus(2); > $g2 = clone $g1; > > $r1 = new Randomizer($g1); > $r2 = new Randomizer($g2); > > var_dump(\bin2hex($r1->getBytes(8))); > var_dump(\bin2hex($r2->getBytes(4)) . \bin2hex($r2->getBytes(4))); > > As a user: Would you expect those two 'var_dump' calls to result in the > same output? > > Personally I would. For me that implies: > > 1. generate() should return raw bytes instead of a number (as I > suggested before). > 2. The 'Randomizer' object should buffer unused bytes internally and > only call generate() if the internal buffer is drained. > > 2) Why xorshift instead of xoshiro / xoroshiro? > > https://vigna.di.unimi.it/xorshift/ says that: > > > Information about my previous xorshift-based generators can be found > here, but they have been entirely superseded by the new ones, which are > faster and better. > > That would imply to me that xorshift should not be used in new > developments. > > 3) Consider the following script: > > <?php > > use Random\NumberGenerator\XorShift128Plus; > > $g1 = new XorShift128Plus(2); > > var_dump($g1); > > exit; > > Should the user be able to see the internal state of the Generator in > the var_dump() output? > > 4) Both xorshift as well as xoshiro / xoroshiro's reference > implementations include a 'jump()' function that allows one to easily > retrieve generators with distinct sequences, without needing to generate > seeds manually which might or might nor introduce a bias. > > Is this something that we should provide as well? > > 5) As a follow-up to (4): Should the 'generate()' method be called > 'next()' or 'step()' instead? Perhaps it should even be '__invoke()'? > > Best regards > Tim Düsterhus >
If there are no objections, I will change the NumberGenerator that Randomizer uses by default to Secure. > Regarding "unintuitive": I disagree. I find it unintuitive that there are some RNG sequences that I can't access when providing a seed. This is also the case for RNG implementations in many other languages. For example, Java also uses long (64-bit) as the seed value of the argument for Math. https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/Random.html#%3Cinit%3E(long) As mentioned above, V8 also uses a 64-bit value as the seed value, and generating with XorShift128+. https://github.com/v8/v8/blob/main/src/base/utils/random-number-generator.h On the other hand, some languages have access to the complete internal state. Python, for example, accepts bytes or bytearrays. https://docs.python.org/3/library/random.html#random.seed However, making strings available in PHP may lead to incorrect usage. I think we can safely do this by making the seed argument accept both int and string, and only using it as the internal state if string is specified and it's 128-bits long. > I've managed to compile it by cleaning the whole directory and rerunning of the build steps. Not sure what I missed the first time. This is probably due to a major change in config.m4. ./buidconf and ./configure need to be rerun properly. > 1. Would you expect those two 'var_dump' calls to result in the same output? Added __debugInfo() magic method supports. https://github.com/php/php-src/pull/8094/commits/78efd2bd1e0ac5db48c272b364a615a5611e8caa > generate() should return raw bytes instead of a number (as I suggested before). I don't think this is a very good idea. The RNG is a random number generator and should probably not be generating strings. Of course, I am aware that strings represent binary sequences in PHP. However, this is not user-friendly. The generation of a binary string is a barrier when trying to implement some kind of operation using numeric computation. If you want to deal with the problem of generated size, it would be more appropriate to define a method such as getGenerateSize() in the interface. Even in this case, generation widths greater than PHP_INT_SIZE cannot be supported, but generation widths greater than 64-bit are not very useful in the first place. > The 'Randomizer' object should buffer unused bytes internally and only call generate() if the internal buffer is drained. Likewise, I think this is not a good idea. Buffering reintroduces the problem of complex state management, which has been made so easy. The user will always have to worry about the buffering size of the Randomizer. > Why xorshift instead of xoshiro / xoroshiro? The XorShift128Plus algorithm is still in use in major browsers and is dead in a good way. Also, in our local testing, SplitMix64 + XorShift128Plus performed well in terms of performance and random number quality, so I don't think it is necessary to choose a different algorithm. If this RFC passes, it will be easier to add algorithms in the future. If a new algorithm is needed, it can be implemented immediately. Regards, Go Kudo