Hi, On Fri Dec 20, 2024 at 9:44 PM CET, Roberto E. Vargas Caballero wrote: > Did you read my other mail about considering if we actually need these > functions? > I begin to consider that standard random() is good enough for our use case, > and > we don't need additional functions. What do you think?
I did read your other mail. Maybe the PCG (rng32()) itself isn't essential, but I think rng32_seed() and rng32_bounded() are. It is all too often to seed rand()/random() badly (eg. using time()). Seeding it correctly is often complicated. Hence having a separate procedure for correctly initializing. And none of these APIs provide _bounded(m) versions (to generate in the range [0, m)), which, too, is non-trivial to implement properly (ie, without bias). I don't feel so strongly about having a custom PRNG, but here are my thoughts: Unacceptable RNGs: - rand() and *rand48() are LCGs, and hence unnacceptable for generating random numbers in a range (low-entropy on the lower bits, which makes for very much non-uniform distributions on a range, which makes permutations usually poor quality). - arc4random is (generally) not available. - /dev/random and /dev/urandom can be unavailable (eg, in a chroot), and system calls used to read aren't standardized (Linux and OpenBSD provide their own, but to my knowledge most Unices do not). The remaining candidates are thus a custom PRNG (eg PCG here), or random(). Arguments for custom PRNG/against random(): - Has better statistical properties, which make it better for generating numbers in a range (and better than random()). - We do not rely on implementation details for correct ("sufficiently random") behavior. Arguments against custom PRNG: - The state of the PCG is 64 bits, which means that we can produce at most 2^63 permutations of any given list. Say we want to shuffle a deck of 52 cards. 52! ~= 10^226, so this RNG would only generate a very tiny fraction of the permutations. random() has 256 bits of state, so it should be able to generate all permutations of lists of length up to 57. Which is arguably too small to make a real difference (I don't think "all permutations _can_ be generated for lists up to 57 elements long, as opposed to 21 for PCG" is a convincing argument). We could go crazy and implement Mersenne Twister and its 19968 bits of state, but again, I do not believe this to be a very good argument. - It's more code (9 SLOC to be precise). In the end, I believe consistent behavior is somewhat referrable over avoiding 9 SLOC in libutil/random. I have a slight preference for a custom PRNG. I'm open to using random() too. Cheers, Elie Le Vaillant