On Wed, Nov 6, 2024 at 12:20 AM Dean Rasheed <dean.a.rash...@gmail.com> wrote: > > On Tue, 5 Nov 2024 at 15:23, Aleksander Alekseev > <aleksan...@timescale.com> wrote: > > > > > > =# SELECT array_random(1, 10, random(0, 3)) FROM generate_series( ... ) > > > > {5} > > > > {1, 3, 8} > > > > {7, 6} > > > > ... > > > > > > Yeah, that looks like a neater API. > > > > > > Something that bothers me somewhat is that it's completely trivial for > > > the user to write such a function for themselves, so is it really > > > useful enough to include in core? > > > > I think it would be useful. Many users don't bother writing C > > extensions for tasks like this. So at least our implementation is > > going to be faster. > > >
it's fine to add such a function, IMHO. we already have array_fill, array_fill fill the same value repeatedly, here we can fill an array with random value. So this is better than array_fill. The most trivial query I can think of is select array_agg((array[random(1, 10), random(1,10), random(1, 10)])); but this can only be up to two dimensional. Users wishing to write such functions with several dimensions can be tricky. > If we are going to add such a function to core, then I think we should > make it consistent and at least as flexible as the other array > functions, and support multi-dimensional arrays with optional > non-default lower-bounds, like array_fill(). I.e., something like: > > random_array(min int, max int, dims int[] [, lbounds int[]]) -> int[] > > Returns an array filled with random values in the range min <= x <= max, > having dimensions of the lengths specified by dims. The optional lbounds > argument supplies lower-bound values for each dimension (which default > to all 1). > please try the attached. instead of > random_array(min int, max int, dims int[] [, lbounds int[]]) -> int[] i choose array_random(min int, max int, dims int[] [, lbounds int[]]) -> int[] I placed it in src/backend/utils/adt/arrayfuncs.c, naming it as array_random to align with the surrounding function name convention. it can be changed. Another issue is we can not use function signature as array_random(anyelement, anyelement, int[] [, int[]]) because currently, we cannot resolve type conflict for cases like array_random(1, 2::bigint). In this case, the first argument should be promoted to bigint. For example: create or replace function polyf(x anyelement, y anyelement) returns anyelement as $$ select x + 1 $$ language sql; select polyf(1, 2::bigint); ERROR: function polyf(integer, bigint) does not exist select polyf(1::bigint, 2); ERROR: function polyf(bigint, integer) does not exist so i invented 3 functions, similar to what we did with random(). now it looks like: \df array_random List of functions Schema | Name | Result data type | Argument data types | Type ------------+--------------+------------------+-------------------------------------------------------------------------------------+------ pg_catalog | array_random | bigint[] | min bigint, max bigint, dims integer[], lbounds integer[] DEFAULT NULL::integer[] | func pg_catalog | array_random | integer[] | min integer, max integer, dims integer[], lbounds integer[] DEFAULT NULL::integer[] | func pg_catalog | array_random | numeric[] | min numeric, max numeric, dims integer[], lbounds integer[] DEFAULT NULL::integer[] | func doc is not there yet.
From 5c1e6a3e5f4764935c60d05e116692fdea422b5a Mon Sep 17 00:00:00 2001 From: jian he <jian.universal...@gmail.com> Date: Mon, 23 Jun 2025 13:38:33 +0800 Subject: [PATCH v1 1/1] array_random we can not use function signature as array_random(anyelement, anyelement, int[] [, int[]]) because currently, we cannot resolve the conflict for array_random(1, 2::bigint). In this case, the first argument should be promoted to bigint. For example: create or replace function polyf(x anyelement, y anyelement) returns anyelement as $$ select x + 1 $$ language sql; select polyf(1, 2::bigint); ERROR: function polyf(integer, bigint) does not exist select polyf(1::bigint, 2); ERROR: function polyf(bigint, integer) does not exist So, we define three separate functions for array_random, similar to the approach used for the random() function. now it looks like: \df array_random List of functions Schema | Name | Result data type | Argument data types | Type ------------+--------------+------------------+-------------------------------------------------------------------------------------+------ pg_catalog | array_random | bigint[] | min bigint, max bigint, dims integer[], lbounds integer[] DEFAULT NULL::integer[] | func pg_catalog | array_random | integer[] | min integer, max integer, dims integer[], lbounds integer[] DEFAULT NULL::integer[] | func pg_catalog | array_random | numeric[] | min numeric, max numeric, dims integer[], lbounds integer[] DEFAULT NULL::integer[] | func (3 rows) doc is missing. --- src/backend/catalog/system_functions.sql | 21 ++ src/backend/utils/adt/arrayfuncs.c | 255 +++++++++++++++++++++++ src/include/catalog/pg_proc.dat | 12 ++ src/test/regress/expected/arrays.out | 85 ++++++++ src/test/regress/sql/arrays.sql | 26 +++ 5 files changed, 399 insertions(+) diff --git a/src/backend/catalog/system_functions.sql b/src/backend/catalog/system_functions.sql index 566f308e443..b3819a0cbdd 100644 --- a/src/backend/catalog/system_functions.sql +++ b/src/backend/catalog/system_functions.sql @@ -73,6 +73,27 @@ CREATE OR REPLACE FUNCTION VOLATILE PARALLEL RESTRICTED STRICT COST 1 AS 'drandom_normal'; +CREATE OR REPLACE FUNCTION + array_random(min integer, max integer, dims integer[], lbounds integer[] DEFAULT NULL) + RETURNS integer[] + LANGUAGE internal + VOLATILE PARALLEL RESTRICTED COST 1 +AS 'int4array_random'; + +CREATE OR REPLACE FUNCTION + array_random (min bigint, max bigint, dims integer[], lbounds integer[] DEFAULT NULL) + RETURNS bigint[] + LANGUAGE internal + VOLATILE PARALLEL RESTRICTED COST 1 +AS 'int8array_random'; + +CREATE OR REPLACE FUNCTION + array_random (min numeric, max numeric, dims integer[], lbounds integer[] DEFAULT NULL) + RETURNS numeric[] + LANGUAGE internal + VOLATILE PARALLEL RESTRICTED COST 1 +AS 'numeric_array_random'; + CREATE OR REPLACE FUNCTION log(numeric) RETURNS numeric LANGUAGE sql diff --git a/src/backend/utils/adt/arrayfuncs.c b/src/backend/utils/adt/arrayfuncs.c index c8f53c6fbe7..1924c96388f 100644 --- a/src/backend/utils/adt/arrayfuncs.c +++ b/src/backend/utils/adt/arrayfuncs.c @@ -166,6 +166,7 @@ static int width_bucket_array_variable(Datum operand, Oid collation, TypeCacheEntry *typentry); +static Datum array_random_internal(FunctionCallInfo fcinfo, Oid elmtype); /* * array_in : @@ -6064,6 +6065,260 @@ array_fill(PG_FUNCTION_ARGS) PG_RETURN_ARRAYTYPE_P(result); } +Datum +int4array_random(PG_FUNCTION_ARGS) +{ + return array_random_internal(fcinfo, INT4OID); +} + +Datum +int8array_random(PG_FUNCTION_ARGS) +{ + return array_random_internal(fcinfo, INT8OID); +} + +Datum +numeric_array_random(PG_FUNCTION_ARGS) +{ + return array_random_internal(fcinfo, NUMERICOID); +} + +/* + * array_random_internal + * Create and populate array random value with default lower bounds. + * + * main internal function for + * array_random(min int4, max int4, dims int[] [, lbounds int[]]) -> int[] + * array_random(min int8, max int8, dims int[] [, lbounds int[]]) -> int8[] + * array_random(min numeric, max numeric, dims int[] [, lbounds int[]]) -> numeric[] + */ +static Datum +array_random_internal(FunctionCallInfo fcinfo, Oid elmtype) +{ + ArrayType *result; + ArrayType *dims; + ArrayType *lbs; + int *dimv; + int *lbsv; + int ndims; + int nitems; + int deflbs[MAXDIM]; + int16 elmlen; + bool elmbyval; + char elmalign; + Datum *values; + bool *nulls; + Oid random_fn_oid = InvalidOid; + + values = (Datum *) palloc(2 * sizeof(Datum)); + nulls = (bool *) palloc(2 * sizeof(bool)); + if (!PG_ARGISNULL(0)) + { + values[0] = PG_GETARG_DATUM(0); + nulls[0] = false; + } + else + { + values[0] = 0; + nulls[0] = true; + } + + if (!PG_ARGISNULL(1)) + { + values[1] = PG_GETARG_DATUM(1); + nulls[1] = false; + } + else + { + values[1] = 0; + nulls[1] = true; + } + + if (PG_ARGISNULL(2)) + ereport(ERROR, + errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED), + errmsg("dimension array cannot be null")); + + dims = PG_GETARG_ARRAYTYPE_P(2); + if (ARR_NDIM(dims) > 1) + ereport(ERROR, + errcode(ERRCODE_ARRAY_SUBSCRIPT_ERROR), + errmsg("wrong number of array subscripts"), + errdetail("Dimension array must be one dimensional.")); + + if (array_contains_nulls(dims)) + ereport(ERROR, + errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED), + errmsg("dimension values cannot be null")); + + dimv = (int *) ARR_DATA_PTR(dims); + ndims = (ARR_NDIM(dims) > 0) ? ARR_DIMS(dims)[0] : 0; + + if (ndims < 0) /* we do allow zero-dimension arrays */ + ereport(ERROR, + errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("invalid number of dimensions: %d", ndims)); + if (ndims > MAXDIM) + ereport(ERROR, + errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED), + errmsg("number of array dimensions (%d) exceeds the maximum allowed (%d)", + ndims, MAXDIM)); + + if (!PG_ARGISNULL(3)) + { + lbs = PG_GETARG_ARRAYTYPE_P(3); + + if (ARR_NDIM(lbs) > 1) + ereport(ERROR, + errcode(ERRCODE_ARRAY_SUBSCRIPT_ERROR), + errmsg("wrong number of array subscripts"), + errdetail("Low bound array must be one dimensional.")); + + if (array_contains_nulls(lbs)) + ereport(ERROR, + errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED), + errmsg("Low bound array can not contain null values")); + + if (ndims != ((ARR_NDIM(lbs) > 0) ? ARR_DIMS(lbs)[0] : 0)) + ereport(ERROR, + errcode(ERRCODE_ARRAY_SUBSCRIPT_ERROR), + errmsg("wrong number of array subscripts"), + errdetail("Low bound array has different size than dimensions array.")); + + lbsv = (int *) ARR_DATA_PTR(lbs); + } + else + { + int i; + + for (i = 0; i < MAXDIM; i++) + deflbs[i] = 1; + + lbsv = deflbs; + } + + /* random function for generating each array element */ + switch (elmtype) + { + case INT4OID: + random_fn_oid = F_RANDOM_INT4_INT4; + break; + case INT8OID: + random_fn_oid = F_RANDOM_INT8_INT8; + break; + case NUMERICOID: + random_fn_oid = F_RANDOM_NUMERIC_NUMERIC; + break; + default: + /* unlikely to happen */ + elog(ERROR, "unsupported type %u for array_random function", elmtype); + break; + } + + if (get_fn_expr_argtype(fcinfo->flinfo, 0) != elmtype) + elog(ERROR, "expected input data type as %u", elmtype); + + /* This checks for overflow of the array dimensions */ + nitems = ArrayGetNItems(ndims, dimv); + ArrayCheckBounds(ndims, dimv, lbsv); + + /* fast track for empty array */ + if (nitems <= 0) + { + result = construct_empty_array(elmtype); + PG_RETURN_ARRAYTYPE_P(result); + } + + get_typlenbyvalalign(elmtype, &elmlen, &elmbyval, &elmalign); + + /* + * For each array element call random(minval, maxval). minval is a type of + * elmtype. We also check array element size. + */ + if (!nulls[0] && !nulls[1]) + { + int i; + int nbytes = 0; + int totbytes = 0; + Datum *out_datums; + bool *out_nulls; + FmgrInfo *random_val_flinfo; + FunctionCallInfo random_val_fcinfo; + + random_val_flinfo = (FmgrInfo *) palloc0(sizeof(FmgrInfo)); + fmgr_info(random_fn_oid, random_val_flinfo); + + random_val_fcinfo = (FunctionCallInfo) palloc0(SizeForFunctionCallInfo(2)); + InitFunctionCallInfoData(*random_val_fcinfo, random_val_flinfo, 2, + InvalidOid, NULL, NULL); + + random_val_fcinfo->args[0].value = values[0]; + random_val_fcinfo->args[0].isnull = false; + random_val_fcinfo->args[1].value = values[1]; + random_val_fcinfo->args[1].isnull = false; + + out_datums = palloc(sizeof(Datum) * nitems); + out_nulls = palloc0(sizeof(bool) * nitems); + + for (i = 0; i < nitems; i++) + { + out_datums[i] = FunctionCallInvoke(random_val_fcinfo); + out_nulls[i] = false; + + /* make sure data is not toasted */ + if (elmlen == -1) + out_datums[i] = PointerGetDatum(PG_DETOAST_DATUM(out_datums[i])); + + /*is this correct? */ + if (i == 0) + { + nbytes = att_addlength_datum(0, elmlen, out_datums[i]); + nbytes = att_align_nominal(nbytes, elmalign); + Assert(nbytes > 0); + + totbytes = nbytes * nitems; + + /* check for overflow of multiplication or total request */ + if (totbytes / nbytes != nitems || + !AllocSizeIsValid(totbytes)) + ereport(ERROR, + errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED), + errmsg("array size exceeds the maximum allowed (%d)", + (int) MaxAllocSize)); + } + } + + result = construct_md_array(out_datums, + out_nulls, + ndims, + dimv, + lbsv, + elmtype, + elmlen, + elmbyval, + elmalign); + } + else + { + /* + * if array_random specified lower bound or upper bound is null, then + * return null we can use create_array_envelope do the job + */ + int nbytes; + int dataoffset; + + dataoffset = ARR_OVERHEAD_WITHNULLS(ndims, nitems); + nbytes = dataoffset; + + result = create_array_envelope(ndims, dimv, lbsv, nbytes, + elmtype, dataoffset); + + /* create_array_envelope already zeroed the bitmap, so we're done */ + } + + PG_RETURN_ARRAYTYPE_P(result); +} + static ArrayType * create_array_envelope(int ndims, int *dimv, int *lbsv, int nbytes, Oid elmtype, int dataoffset) diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index d3d28a263fa..eb2dcdb3db5 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -1712,6 +1712,18 @@ proname => 'array_fill', proisstrict => 'f', prorettype => 'anyarray', proargtypes => 'anyelement _int4 _int4', prosrc => 'array_fill_with_lower_bounds' }, +{ oid => '6347', descr => 'array constructor with random integer element value', + proname => 'array_random', provolatile => 'v', proisstrict => 'f', + prorettype => '_int4', proargtypes => 'int4 int4 _int4 _int4', + proargnames => '{min,max,dims,lbounds}', prosrc => 'int4array_random' }, +{ oid => '6348', descr => 'array constructor with random bigint element value', + proname => 'array_random', provolatile => 'v', proisstrict => 'f', + prorettype => '_int8', proargtypes => 'int8 int8 _int4 _int4', + proargnames => '{min,max,dims,lbounds}', prosrc => 'int8array_random' }, +{ oid => '6349', descr => 'array constructor with random numeric element value', + proname => 'array_random', provolatile => 'v', proisstrict => 'f', + prorettype => '_numeric', proargtypes => 'numeric numeric _int4 _int4', + proargnames => '{min,max,dims,lbounds}', prosrc => 'numeric_array_random' }, { oid => '2331', descr => 'expand array to set of rows', proname => 'unnest', prorows => '100', prosupport => 'array_unnest_support', proretset => 't', prorettype => 'anyelement', proargtypes => 'anyarray', diff --git a/src/test/regress/expected/arrays.out b/src/test/regress/expected/arrays.out index b815473f414..172e04fe1fe 100644 --- a/src/test/regress/expected/arrays.out +++ b/src/test/regress/expected/arrays.out @@ -3002,3 +3002,88 @@ SELECT array_sort(a [-1:0][20:20]) FROM (VALUES ('[-2:0][20:21]={{1,2},{10,20},{ {{1},{10}} (1 row) +--array_random error case +SELECT array_random(1, 10, '{134217728}'); +ERROR: array size exceeds the maximum allowed (134217727) +SELECT array_random(2, 1, '{2}'::int[]); +ERROR: lower bound must be less than or equal to upper bound +SELECT array_random(2, 1, '{-1}'::int[]); +ERROR: array size exceeds the maximum allowed (134217727) +SELECT array_random(NULL, 1, NULL); +ERROR: dimension array cannot be null +SELECT array_random(1, 2, '{NULL}'::int[]); +ERROR: dimension values cannot be null +SELECT array_random(1, 2, '{{1},{1}}'::int[]); +ERROR: wrong number of array subscripts +DETAIL: Dimension array must be one dimensional. +SELECT array_random('-infinity', 2.1, '{2,1}'::int[]); +ERROR: lower bound cannot be infinity +SELECT array_random('NaN', 2.1, '{2,1}'::int[]); +ERROR: lower bound cannot be NaN +SELECT array_random(1::numeric, 'infinity', '{2,1}'::int[]); +ERROR: upper bound cannot be infinity +SELECT array_random(1::numeric, 'NaN', '{2,1}'::int[]); +ERROR: upper bound cannot be NaN +SELECT array_random(1, 10, '{2}'::int[],'{NULL}'); +ERROR: Low bound array can not contain null values +SELECT array_random(1, 10, '{2}'::int[],'{1,2}'); +ERROR: wrong number of array subscripts +DETAIL: Low bound array has different size than dimensions array. +SELECT array_random(1, 10, '{2}'::int[],'{{1},{2}}'); +ERROR: wrong number of array subscripts +DETAIL: Low bound array must be one dimensional. +--array_random ok case, roughly copied from random.sql +SELECT setseed(0.5); + setseed +--------- + +(1 row) + +SELECT array_random(1, 10::int8, '{2}'::int[]); + array_random +-------------- + {3,3} +(1 row) + +SELECT array_random(1, 6, array[2,5], array[2,4]); + array_random +-------------------------------------- + [2:3][4:8]={{6,2,2,5,4},{4,5,6,4,6}} +(1 row) + +SELECT array_random(-2147483648, 2147483647, array[5,2], '{5,2}'); + array_random +--------------------------------------------------------------------------------------------------------------------------------------- + [5:9][2:3]={{358987948,1720494942},{1522989794,-1952561092},{-562516644,-1082313839},{1917063748,1470913398},{-1663900214,383076569}} +(1 row) + +SELECT array_random(-9223372036854775808, 9223372036854775807, '{5,2}', '{5,2}'); + array_random +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- + [5:9][2:3]={{8946456079150312946,-5017870099771078560},{2119550555182341436,4426024215363564415},{108422942355035228,8010643357635886140},{7966634679525141473,-7256609373765379790},{1904816578321642612,2378596988253719677}} +(1 row) + +\gdesc + Column | Type +--------------+---------- + array_random | bigint[] +(1 row) + +SELECT array_random(-1e30, -1e30, '{3,2}'); + array_random +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- + {{-1000000000000000000000000000000,-1000000000000000000000000000000},{-1000000000000000000000000000000,-1000000000000000000000000000000},{-1000000000000000000000000000000,-1000000000000000000000000000000}} +(1 row) + +SELECT array_random(-0.45, 0.45, '{2,2,3}', '{5,2,3}'); + array_random +-------------------------------------------------------------------------------------------------- + [5:6][2:3][3:5]={{{-0.31,0.17,-0.16},{-0.25,-0.33,-0.02}},{{0.26,-0.31,-0.16},{0.08,0.23,0.13}}} +(1 row) + +SELECT array_random(0, 1 - 1e-30, '{2, 5}'); + array_random +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- + {{0.068990271742366814671825526682,0.865328382850933408408243093209,0.535285120767345535159068774920,0.550766856332365486462263124933,0.327167727807358961934200155904},{0.985749287984797224624803826994,0.475761851397322538460984536852,0.793479002749253814546091362589,0.164860958495312801172978594373,0.519707802923826482493035607188}} +(1 row) + diff --git a/src/test/regress/sql/arrays.sql b/src/test/regress/sql/arrays.sql index 47d62c1d38d..8bcc5424db8 100644 --- a/src/test/regress/sql/arrays.sql +++ b/src/test/regress/sql/arrays.sql @@ -892,3 +892,29 @@ SELECT array_sort(a) FROM (VALUES ('[-1:0]={7,1}'::int[])) v(a); SELECT array_sort(a) FROM (VALUES ('[-2:0][20:21]={{1,2},{10,20},{1,-4}}'::int[])) v(a); SELECT array_sort(a [-1:0]) FROM (VALUES ('[-2:0][20:21]={{1,2},{10,20},{1,-4}}'::int[])) v(a); SELECT array_sort(a [-1:0][20:20]) FROM (VALUES ('[-2:0][20:21]={{1,2},{10,20},{1,-4}}'::int[])) v(a); + +--array_random error case +SELECT array_random(1, 10, '{134217728}'); +SELECT array_random(2, 1, '{2}'::int[]); +SELECT array_random(2, 1, '{-1}'::int[]); +SELECT array_random(NULL, 1, NULL); +SELECT array_random(1, 2, '{NULL}'::int[]); +SELECT array_random(1, 2, '{{1},{1}}'::int[]); +SELECT array_random('-infinity', 2.1, '{2,1}'::int[]); +SELECT array_random('NaN', 2.1, '{2,1}'::int[]); +SELECT array_random(1::numeric, 'infinity', '{2,1}'::int[]); +SELECT array_random(1::numeric, 'NaN', '{2,1}'::int[]); +SELECT array_random(1, 10, '{2}'::int[],'{NULL}'); +SELECT array_random(1, 10, '{2}'::int[],'{1,2}'); +SELECT array_random(1, 10, '{2}'::int[],'{{1},{2}}'); + +--array_random ok case, roughly copied from random.sql +SELECT setseed(0.5); +SELECT array_random(1, 10::int8, '{2}'::int[]); +SELECT array_random(1, 6, array[2,5], array[2,4]); +SELECT array_random(-2147483648, 2147483647, array[5,2], '{5,2}'); +SELECT array_random(-9223372036854775808, 9223372036854775807, '{5,2}', '{5,2}'); +\gdesc +SELECT array_random(-1e30, -1e30, '{3,2}'); +SELECT array_random(-0.45, 0.45, '{2,2,3}', '{5,2,3}'); +SELECT array_random(0, 1 - 1e-30, '{2, 5}'); -- 2.34.1