Martin Kalcher <martin.kalc...@aboutsource.net> writes: > New patch: array_shuffle() and array_sample() use pg_global_prng_state now.
I took a closer look at the patch today. I find this behavior a bit surprising: +SELECT array_dims(array_sample('[-1:2][2:3]={{1,2},{3,NULL},{5,6},{7,8}}'::int[], 3)); + array_dims +------------- + [-1:1][2:3] +(1 row) I can buy preserving the lower bound in array_shuffle(), but array_sample() is not preserving the first-dimension indexes of the array, so ISTM it ought to reset the first lower bound to 1. Some other comments: + Returns <parameter>n</parameter> randomly chosen elements from <parameter>array</parameter> in selection order. What's "selection order"? And this probably shouldn't just rely on the example to describe what happens with multi-D arrays. Writing "elements" seems somewhere between confusing and wrong. * Personally I think I'd pass the TypeCacheEntry pointer to array_shuffle_n, and let it pull out what it needs. Less duplicate code that way. * I find array_shuffle_n drastically undercommented, and what comments it has are pretty misleading, eg + /* Swap all elements in item (i) with elements in item (j). */ j is *not* the index of the second item to be swapped. You could make it so, and that might be more readable: j = (int) pg_prng_uint64_range(&pg_global_prng_state, i, nitem - 1); jelms = elms + j * nelm; jnuls = nuls + j * nelm; But if you want the code to stay as it is, this comment needs work. * I think good style for SQL-callable C functions is to make the arguments clear at the top: +array_sample(PG_FUNCTION_ARGS) +{ + ArrayType *array = PG_GETARG_ARRAYTYPE_P(0); + int n = PG_GETARG_INT32(1); + ArrayType *result; + ... other declarations as needed ... We can't quite make normal C declaration style work, but that's a poor excuse for not making the argument list visible as directly as possible. * I wouldn't bother with the PG_FREE_IF_COPY calls. Those are generally only used in btree comparison functions, in which there's a need to not leak memory. * I wonder if we really need these to be proparallel = 'r'. If we let a worker process execute them, they will take numbers from the worker's pg_global_prng_state seeding not the parent process's seeding, but why is that a problem? In the prior version where you were tying them to the parent's drandom() sequence, proparallel = 'r' made sense; but now I think it's unnecessary. regards, tom lane