On Thu, Feb 18, 2016 at 11:45 PM, Zeev Suraski <z...@zend.com> wrote:

> > With rand functions, I don't think we need to touch them. For some
> > applications, low-key randomness is just fine - if you need to shuffle
> > array of 20 elements or randomize unit test to ensure you're not testing
> > same value all the time, low-quality randomness is completely fine. For
> > other applications, there are superior solutions and everybody who needs
> > them already uses them, but again I see no value in removing those
> > functions. It would only cause more breakage and make adoption of new
> > versions (already horrible) even slower.
>
> I think the obvious option here is to make rand() and srand() aliases to
> rand_mt() and srand_mt(), unless I'm missing something very basic, unless
> I'm missing something very basic here..?  I see zero reason to deprecate
> them and break so much code when we can simply 'upgrade' them at zero cost
> to both us and users.
>

The usual argument against aliasing rand() to mt_rand() is that it will
change the sequence that is generated for a specific srand() seed, thus
breaking code that relies on specific sequences. However, as removing the
functions in the future would break the code anyway, I think I agree with
you that just aliasing them is a better option.

We may need to discuss our non-cryptographic PRNG functionality anyway,
there's quite a number of issues:

 * rand(), the first function anyone will try, uses a potentially horrible
libc RNG
 * It was recently noticed that the mt_rand() implementation contains a
typo and our output differs from the original well-researched algorithm. As
yet it is unclear what that typo does to the quality of the output.
 * mt_getrandmax() is 2^31-1 even on 64-bit machines and numbers are scaled
using floating point multiplication. That means if you tell mt_rand() to
generate a 64-bit random numbers by specifying the range, only a tiny
fraction of numbers can actually be hit. I also strongly suspect that the
floating point scaling is inherently non-uniform even for smaller ranges.
 * Functions like array_rand() or shuffle() use rand() and not mt_rand(),
so if you're on Windows and your array is larger than some 30k elements the
output will likely be severely biased.
 * The array_rand() implementation is O(N) even if you only choose a single
key (likely by far the most common case). If you use array_rand() on a 1M
element array, we'll generate 0.5M random numbers on average.

Even though changing our PRNG implementations will break seed sequences, I
think the time has come to clean up this mess for 7.1. (We might also want
to consider to alias rand and mt_rand to an entirely new algorithm, not
MT19937. Nowadays PRNGs are available that have both better statistical
properties and are faster than MT.)

On a different note, I don't think that philosophical discussions on the
topic of how much we ought to be deprecating will be very productive --
this is one of those topics people tend to be very stubborn about ;) Some
people value stability above everything else, and for others the number one
evil in PHP is our reluctance to get rid of old ---crap--- cruft. It would
be nice if we could let voting decide that question, and keep this thread
focused on specific issues and suggestion. I.e. on one hand suggestions for
things that we may want to deprecate, together with reasoning for why we
should do it. And on the other hand alternatives to deprecation (your
suggestion for rand), reasons why something shouldn't be deprecated (e.g.
functionality not otherwise available, see hebrev; or migration would be
problematic because XYZ; or project ABC uses this heavily because ...)

Thanks,
Nikita

Reply via email to