Re: random() (was Re: New GUC to sample log queries)

Fabien COELHO Fri, 28 Dec 2018 00:19:37 -0800


Hello Tom,

Another idea, which would be a lot less prone to breakage by
add-on code, is to change drandom() and setseed() to themselves
use pg_erand48() with a private seed.

The pg_erand48 code looks like crumbs from the 70's optimized for 16 bits
architectures (which it is probably not, but why not going to 64 bits or
128 bits directly looks like a missed opportunity), its internal state is
48 bits as its name implies, and its period probably around 2**48, which
is 2**12 better than the previous case, not an extraordinary achievement.


I can't get terribly excited about rewriting that code.  You're arguing
from a "one size should fit all" perspective,

Not exactly. I'm not claiming that distinguishing parts that requiregood random from others is a bad choice. I'm arguing that:

(1) from a software engineering perspective, the PRNG implementationshould hide the underlying generator name and state size.

(2) from a numerical perspective, poor seedings practice should be avoidedwhen possible.

(3) from a cryptographic perspective, LCG is a poor choice of fast PRNG,which a quick look at wikipedia (mother of all knowledge) confirms.

(4) the fact that pg historical PRNG choices are well documented is not agood justification for not improving them.

Better alternatives exist that do not cost much (eg xorshift variants,WELL...), some of which are optimized for 64 bits architectures.

which is exactly not the design approach we're using.

We've already converted security-sensitive PRNG uses to usepg_strong_random (or if we haven't, that's a localized bug in any suchplaces we missed). What remains are places where we are not soconcerned about cryptographic strength but rather speed.

I do agree with the speed concern. I'm arguing that better quality atspeed can be achieved with better seeding practices and not using a LCG.


About costs, not counting array accesses:

 - lrand48 (48 bits state as 3 uint16)        is 29 ops
   (10 =, 8 *, 7 +, 4 >>)
 - xorshift+ (128 bits state as 2 uint64)     is 13 ops
   ( 5 =, 0 *, 1 +, 3 >>, 4 ^)
 - xororshift128+ (idem)                      is 17 ops
   ( 6 =, 0 *, 1 +, 5 >>, 3 ^, 2 |, less if rot in hardware)
 - WELL512 (512 bits state as 16 uint32)      is 38 ops
   (11 =, 0 *, 3 +, 7 >>, 10 ^, 4 &)
   probably much better, but probably slower than the current version

I'd be of the (debatable) opinion that we could use xororshift128+,already used by various languages, even if it fails some specializedtests.

It does behoove us to ensure that the seed values are unpredictable andthat a user-controllable seed isn't used for internal operations,


Sure.

but I don't feel a need for replacing the algorithm.

Hmmm. Does it mean that you would veto any change, even if the speedconcern is addressed (i.e. faster/not slower with better quality)?

You might argue that the SQL function drandom should be held to a higher
standard, but we document exactly this same tradeoff for it.  Users who
want cryptographic strength are directed to pgcrypto, leaving the audience
for drandom being people who want speed.


My point is that speed is not necessary incompatible with better quality.
Better quality should not replace strong random when needed.

--
Fabien.

Re: random() (was Re: New GUC to sample log queries)

Reply via email to