Re: [PATCH] Introduce array_shuffle() and array_sample()

2023-04-19 Thread Salek Talangi
Hi all, reading this blog post https://www.depesz.com/2023/04/18/waiting-for-postgresql-16-add-array_sample-and-array_shuffle-functions/ I became aware of the new feature and had a look at it and the commit https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=888f2ea0a81ff171087bdd1

Re: [PATCH] Introduce array_shuffle() and array_sample()

2023-04-07 Thread Daniel Gustafsson
> On 7 Apr 2023, at 17:47, Tom Lane wrote: > > Daniel Gustafsson writes: >> Ah, ok, now I see what you mean, thanks! I'll try to fix up the patch like >> this tomorrow. > > Since we're running out of time, I took the liberty of fixing and > pushing this. Great, thanks! -- Daniel Gustafsson

Re: [PATCH] Introduce array_shuffle() and array_sample()

2023-04-07 Thread Tom Lane
Daniel Gustafsson writes: > Ah, ok, now I see what you mean, thanks! I'll try to fix up the patch like > this tomorrow. Since we're running out of time, I took the liberty of fixing and pushing this. regards, tom lane

Re: [PATCH] Introduce array_shuffle() and array_sample()

2023-04-03 Thread Daniel Gustafsson
> On 3 Apr 2023, at 23:46, Tom Lane wrote: > > Daniel Gustafsson writes: >> On 29 Sep 2022, at 21:33, Tom Lane wrote: >>> I find this behavior a bit surprising: >>> >>> +SELECT >>> array_dims(array_sample('[-1:2][2:3]={{1,2},{3,NULL},{5,6},{7,8}}'::int[], >>> 3)); >>> + array_dims >>> +---

Re: [PATCH] Introduce array_shuffle() and array_sample()

2023-04-03 Thread Tom Lane
Daniel Gustafsson writes: > On 29 Sep 2022, at 21:33, Tom Lane wrote: >> I find this behavior a bit surprising: >> >> +SELECT >> array_dims(array_sample('[-1:2][2:3]={{1,2},{3,NULL},{5,6},{7,8}}'::int[], >> 3)); >> + array_dims >> +- >> + [-1:1][2:3] >> +(1 row) >> >> I can buy

Re: [PATCH] Introduce array_shuffle() and array_sample()

2023-04-03 Thread Daniel Gustafsson
> On 29 Sep 2022, at 21:33, Tom Lane wrote: > > Martin Kalcher writes: >> New patch: array_shuffle() and array_sample() use pg_global_prng_state now. > > I took a closer look at the patch today. Since this seems pretty close to going in, and seems like quite useful functions, I took a look to

Re: [PATCH] Introduce array_shuffle() and array_sample()

2023-04-03 Thread Gregory Stark (as CFM)
Given that there's been no updates since September 22 I'm going to make this patch Returned with Feedback. The patch can be resurrected when there's more work done -- don't forget to move it to the new CF when you do that. -- Gregory Stark As Commitfest Manager

Re: [PATCH] Introduce array_shuffle() and array_sample()

2023-03-20 Thread Gregory Stark (as CFM)
On Thu, 29 Sept 2022 at 15:34, Tom Lane wrote: > > Martin Kalcher writes: > > New patch: array_shuffle() and array_sample() use pg_global_prng_state now. > > I took a closer look at the patch today. I find this behavior a bit > surprising: > It looks like this patch received useful feedback and

Re: [PATCH] Introduce array_shuffle() and array_sample()

2022-09-29 Thread Tom Lane
Martin Kalcher writes: > New patch: array_shuffle() and array_sample() use pg_global_prng_state now. I took a closer look at the patch today. I find this behavior a bit surprising: +SELECT array_dims(array_sample('[-1:2][2:3]={{1,2},{3,NULL},{5,6},{7,8}}'::int[], 3)); + array_dims +-

Re: [PATCH] Introduce array_shuffle() and array_sample()

2022-09-29 Thread Martin Kalcher
Am 28.09.22 um 16:18 schrieb Tom Lane: It is seeded at process start, yes. If you don't feel a need for user control over the sequence used by these functions, then using pg_global_prng_state is fine. (Basically the point to be made here is that we need to keep a tight rein on what can be affec

Re: [PATCH] Introduce array_shuffle() and array_sample()

2022-09-28 Thread Tom Lane
Fabien COELHO writes: >> Thanks for your thoughts, Tom. I have a couple of questions. Should we >> introduce a new seed function for the new PRNG state, used by >> array_shuffle() >> and array_sample()? What would be a good name? Or should those functions use >> pg_global_prng_state? Is it saf

Re: [PATCH] Introduce array_shuffle() and array_sample()

2022-09-28 Thread Fabien COELHO
With our current PRNG infrastructure it doesn't cost much to have a separate PRNG for every purpose. I don't object to having array_shuffle() and array_sample() share one PRNG, but I don't think it should go much further than that. Thanks for your thoughts, Tom. I have a couple of questions.

Re: [PATCH] Introduce array_shuffle() and array_sample()

2022-09-28 Thread Martin Kalcher
Am 26.09.22 um 22:16 schrieb Tom Lane: With our current PRNG infrastructure it doesn't cost much to have a separate PRNG for every purpose. I don't object to having array_shuffle() and array_sample() share one PRNG, but I don't think it should go much further than that. Thanks for your thoug

Re: [PATCH] Introduce array_shuffle() and array_sample()

2022-09-26 Thread Tom Lane
Martin Kalcher writes: > [ v4-0001-Introduce-array_shuffle-and-array_sample.patch ] I think this idea of exporting drandom()'s PRNG for all and sundry to use is completely misguided. If we go down that path we'll be right back in the swamp that we were in when we used random(3), namely that (a)

Re: [PATCH] Introduce array_shuffle() and array_sample()

2022-09-22 Thread Martin Kalcher
Am 22.09.22 um 17:23 schrieb Andres Freund: Hi, On 2022-08-04 07:46:10 +0200, Martin Kalcher wrote: Patch update without merge conflicts. Due to the merge of the meson based build, this patch needs to be adjusted. See https://cirrus-ci.com/build/6580671765282816 Looks like it'd just be adding

Re: [PATCH] Introduce array_shuffle() and array_sample()

2022-09-22 Thread Andres Freund
Hi, On 2022-08-04 07:46:10 +0200, Martin Kalcher wrote: > Patch update without merge conflicts. Due to the merge of the meson based build, this patch needs to be adjusted. See https://cirrus-ci.com/build/6580671765282816 Looks like it'd just be adding user_prng.c to src/backend/utils/adt/meson.bu

Re: [PATCH] Introduce array_shuffle() and array_sample()

2022-08-03 Thread Martin Kalcher
Patch update without merge conflicts. MartinFrom 0ecffcf3ed2eb59d045941b69bb86a34b93f3391 Mon Sep 17 00:00:00 2001 From: Martin Kalcher Date: Sun, 17 Jul 2022 18:06:04 +0200 Subject: [PATCH v3] Introduce array_shuffle() and array_sample() * array_shuffle() shuffles the elements of an array. * a

Re: [PATCH] Introduce array_shuffle() and array_sample()

2022-07-25 Thread Martin Kalcher
Am 24.07.22 um 21:42 schrieb Fabien COELHO: Duno. I'm still wondering what it should do. I'm pretty sure that the documentation should be clear about a shared seed, if any. I do not think that departing from the standard is a good thing, either. Are sure it violates the standard? I could not

Re: [PATCH] Introduce array_shuffle() and array_sample()

2022-07-24 Thread Fabien COELHO
Hello, Thank you for your feedback. I attached a patch, that addresses most of your points. I'll look into it. It would help if the patch could include a version number at the end. Should the exchange be skipped when i == k? The additional branch is actually slower (on my machine, test

Re: [PATCH] Introduce array_shuffle() and array_sample()

2022-07-24 Thread Martin Kalcher
3d8388c Mon Sep 17 00:00:00 2001 From: Martin Kalcher Date: Sun, 17 Jul 2022 18:06:04 +0200 Subject: [PATCH] Introduce array_shuffle() and array_sample() * array_shuffle() shuffles the elements of an array. * array_sample() chooses max n elements from an array by random. The new functions shar

Re: [PATCH] Introduce array_shuffle() and array_sample()

2022-07-24 Thread Fabien COELHO
i came to the same conclusions and went with Option 1 (see patch). Mainly because most code in utils/adt is organized by type and this way it is clear, that this is a thin wrapper around pg_prng. Small patch update. I realized the new functions should live array_userfuncs.c (rather than ar

Re: [PATCH] Introduce array_shuffle() and array_sample()

2022-07-23 Thread Martin Kalcher
should live array_userfuncs.c (rather than arrayfuncs.c), fixed some file headers and added some comments to the code.From 138777531961c31250074d1c0d87417e31d63656 Mon Sep 17 00:00:00 2001 From: Martin Kalcher Date: Sun, 17 Jul 2022 18:06:04 +0200 Subject: [PATCH] Introduce array_shuffle() and

Re: [PATCH] Introduce array_shuffle() and array_sample()

2022-07-22 Thread Joe Conway
On 7/19/22 10:20, Tom Lane wrote: Everything else either explicitly rejects more-than-one-D arrays or does something that is compatible with thinking of them as arrays-of-arrays. I think I am responsible for at least some of those, and I agree that thinking of MD arrays as arrays-of-arrays is

Re: [PATCH] Introduce array_shuffle() and array_sample()

2022-07-22 Thread Dean Rasheed
On Fri, 22 Jul 2022 at 10:31, Martin Kalcher wrote: > > i came to the same conclusions and went with Option 1 (see patch). > Mainly because most code in utils/adt is organized by type and this way > it is clear, that this is a thin wrapper around pg_prng. > > What do you think? Looks fairly neat,

Re: [PATCH] Introduce array_shuffle() and array_sample()

2022-07-22 Thread Martin Kalcher
ode in utils/adt is organized by type and this way it is clear, that this is a thin wrapper around pg_prng. What do you think?From ceda50f1f7f7e0c123de9b2ce2cc7b5d2b2b7db6 Mon Sep 17 00:00:00 2001 From: Martin Kalcher Date: Sun, 17 Jul 2022 18:06:04 +0200 Subject: [PATCH] Introduce array_shuf

Re: [PATCH] Introduce array_shuffle() and array_sample()

2022-07-22 Thread Dean Rasheed
On Thu, 21 Jul 2022 at 16:43, Martin Kalcher wrote: > > Am 21.07.22 um 14:25 schrieb Dean Rasheed: > > > > I'm inclined to say that we want a new pg_global_prng_user_state that > > is updated by setseed(), and used by random(), array_shuffle(), > > array_sample(), and any other user-facing random

Re: [PATCH] Introduce array_shuffle() and array_sample()

2022-07-21 Thread Martin Kalcher
17 00:00:00 2001 From: Martin Kalcher Date: Sun, 17 Jul 2022 18:06:04 +0200 Subject: [PATCH] Introduce array_shuffle() and array_sample() * array_shuffle() shuffles the elements of an array. * array_sample() chooses n elements from an array by random. Shuffling of arrays can already be accomplish

Re: [PATCH] Introduce array_shuffle() and array_sample()

2022-07-21 Thread Martin Kalcher
Am 21.07.22 um 14:25 schrieb Dean Rasheed: I'm inclined to say that we want a new pg_global_prng_user_state that is updated by setseed(), and used by random(), array_shuffle(), array_sample(), and any other user-facing random functions we add later. I like the idea. How would you organize the

Re: [PATCH] Introduce array_shuffle() and array_sample()

2022-07-21 Thread Dean Rasheed
On Thu, 21 Jul 2022 at 12:15, Martin Kalcher wrote: > > I agree that we should use pg_prng_uint64_range(). However, in order to > achieve interoperability with setseed() we would have to use > drandom_seed (rather than pg_global_prng_state) as rng state, which is > declared statically in float.c a

Re: [PATCH] Introduce array_shuffle() and array_sample()

2022-07-21 Thread Martin Kalcher
Am 21.07.22 um 10:41 schrieb Dean Rasheed: A couple of quick comments on the current patch: Thank you for your feedback! It's important to mark these new functions as VOLATILE, not IMMUTABLE, otherwise they won't work as expected in queries. See https://www.postgresql.org/docs/current/xfunc-

Re: [PATCH] Introduce array_shuffle() and array_sample()

2022-07-21 Thread Dean Rasheed
On Tue, 19 Jul 2022 at 21:21, Martin Kalcher wrote: > > Here is a patch with dimension aware array_shuffle() and array_sample(). > +1 for this feature, and this way of handling multi-dimensional arrays. > If you think array_flatten() is desirable, i can take a look at it. That's not something I

Re: [PATCH] Introduce array_shuffle() and array_sample()

2022-07-19 Thread Martin Kalcher
[7,8]]], 1); --- {1,2,3,4,5,6,7,8} select array_flatten(array[[[1,2],[3,4]],[[5,6],[7,8]]], 2); --- {{1,2,3,4},{5,6,7,8}} MartinFrom 2aa6d72ff0f4d8835ee2f09f8cdf16b7e8005e56 Mon Sep 17 00:00:00 2001 From: Martin Kalcher Date: Sun, 17 Jul 2022 18:06:04 +0200 Subject: [PATCH] Introduce arr

Re: [PATCH] Introduce array_shuffle() and array_sample()

2022-07-19 Thread Tom Lane
Robert Haas writes: > On Tue, Jul 19, 2022 at 9:53 AM Andrew Dunstan wrote: >> Why not have an optional second parameter for array_shuffle that >> indicates whether or not to flatten? e.g. array_shuffle(my_array, >> flatten => true) > IMHO, if we think that's something many people are going to w

Re: [PATCH] Introduce array_shuffle() and array_sample()

2022-07-19 Thread Robert Haas
On Tue, Jul 19, 2022 at 9:53 AM Andrew Dunstan wrote: > > Having thought about it, i would go with (2). It gives the user the > > ability to decide wether or not array-of-arrays behavior is desired. > > If he wants the behavior of (1) he can flatten the array before > > applying array_shuffle(). U

Re: [PATCH] Introduce array_shuffle() and array_sample()

2022-07-19 Thread Andrew Dunstan
On 2022-07-19 Tu 07:15, Martin Kalcher wrote: > Am 18.07.22 um 23:48 schrieb Martin Kalcher: >> >> If we go with (1) array_shuffle() and array_sample() should shuffle >> each element individually and always return a one-dimensional array. >> >>    select array_shuffle('{{1,2},{3,4},{5,6}}'); >>  

Re: [PATCH] Introduce array_shuffle() and array_sample()

2022-07-19 Thread Robert Haas
On Mon, Jul 18, 2022 at 6:43 PM Tom Lane wrote: > Um ... why is "the order in which the elements were chosen" a concept > we want to expose? ISTM sample() is a black box in which notionally > the decisions could all be made at once. I agree with that. But I also think it's fine for the elements

Re: [PATCH] Introduce array_shuffle() and array_sample()

2022-07-19 Thread Martin Kalcher
Am 18.07.22 um 23:48 schrieb Martin Kalcher: If we go with (1) array_shuffle() and array_sample() should shuffle each element individually and always return a one-dimensional array.   select array_shuffle('{{1,2},{3,4},{5,6}}');   ---    {1,4,3,5,6,2}   select array_sample('{{1,2

Re: [PATCH] Introduce array_shuffle() and array_sample()

2022-07-19 Thread Aleksander Alekseev
Hi Martin, I didn't look at the code yet but I very much like the idea. Many thanks for working on this! It's a pity your patch was too late for the July commitfest. In September, please keep an eye on cfbot [1] to make sure your patch applies properly. > As Tom's investigation showed, there is

Re: [PATCH] Introduce array_shuffle() and array_sample()

2022-07-19 Thread Martin Kalcher
Am 19.07.22 um 00:52 schrieb Martin Kalcher: On the contrary! I am pretty sure there are people out there wanting sampling-without-shuffling. I will think about that. I gave it some thought. Even though there might be use cases, where a stable order is desired, i would consider them edge cas

Re: [PATCH] Introduce array_shuffle() and array_sample()

2022-07-18 Thread Thomas Munro
On Tue, Jul 19, 2022 at 8:15 AM Martin Kalcher wrote: > Am 18.07.22 um 21:29 schrieb Tom Lane: > > The preferred thing to do is to add it to our "commitfest" queue, > > which will ensure that it gets looked at eventually. The currently > > open cycle is 2022-09 [1] (see the "New Patch" button the

Re: [PATCH] Introduce array_shuffle() and array_sample()

2022-07-18 Thread Martin Kalcher
Am 19.07.22 um 00:18 schrieb Tom Lane: Independently of the dimensionality question --- I'd imagined that array_sample would select a random subset of the array elements but keep their order intact. If you want the behavior shown above, you can do array_shuffle(array_sample(...)). But if we ra

Re: [PATCH] Introduce array_shuffle() and array_sample()

2022-07-18 Thread Tom Lane
"David G. Johnston" writes: > On Mon, Jul 18, 2022 at 3:18 PM Tom Lane wrote: >> Independently of the dimensionality question --- I'd imagined that >> array_sample would select a random subset of the array elements >> but keep their order intact. If you want the behavior shown >> above, you can

Re: [PATCH] Introduce array_shuffle() and array_sample()

2022-07-18 Thread David G. Johnston
On Mon, Jul 18, 2022 at 3:18 PM Tom Lane wrote: > > Independently of the dimensionality question --- I'd imagined that > array_sample would select a random subset of the array elements > but keep their order intact. If you want the behavior shown > above, you can do array_shuffle(array_sample(..

Re: [PATCH] Introduce array_shuffle() and array_sample()

2022-07-18 Thread Tom Lane
Martin Kalcher writes: > If we go with (1) array_shuffle() and array_sample() should shuffle each > element individually and always return a one-dimensional array. >select array_shuffle('{{1,2},{3,4},{5,6}}'); >--- > {1,4,3,5,6,2} >select array_sample('{{1,2},{3,4},{5,6}

Re: [PATCH] Introduce array_shuffle() and array_sample()

2022-07-18 Thread Martin Kalcher
Am 18.07.22 um 23:03 schrieb Tom Lane: I wrote: Martin had originally proposed (2), which I rejected on the grounds that we don't treat multi-dimensional arrays as arrays-of-arrays for any other purpose. Actually, after poking at it for awhile, that's an overstatement. It's true that the type

Re: [PATCH] Introduce array_shuffle() and array_sample()

2022-07-18 Thread Tom Lane
I wrote: > Martin had originally proposed (2), which I rejected on the grounds > that we don't treat multi-dimensional arrays as arrays-of-arrays for > any other purpose. Actually, after poking at it for awhile, that's an overstatement. It's true that the type system doesn't think N-D arrays are a

Re: [PATCH] Introduce array_shuffle() and array_sample()

2022-07-18 Thread Tom Lane
Robert Haas writes: > On Mon, Jul 18, 2022 at 3:03 PM Martin Kalcher > wrote: >> array_shuffle(anyarray) -> anyarray >> array_sample(anyarray, integer) -> anyarray > I think it's questionable whether the behavior of array_shuffle() is > correct for a multi-dimensional array. The implemented beha

Re: [PATCH] Introduce array_shuffle() and array_sample()

2022-07-18 Thread Robert Haas
On Mon, Jul 18, 2022 at 3:03 PM Martin Kalcher wrote: > Thanks for all your feedback and help. I got a patch that i consider > ready for review. It introduces two new functions: > >array_shuffle(anyarray) -> anyarray >array_sample(anyarray, integer) -> anyarray > > array_shuffle() shuffles

Re: [PATCH] Introduce array_shuffle() and array_sample()

2022-07-18 Thread Martin Kalcher
Am 18.07.22 um 21:29 schrieb Tom Lane: The preferred thing to do is to add it to our "commitfest" queue, which will ensure that it gets looked at eventually. The currently open cycle is 2022-09 [1] (see the "New Patch" button there). Thanks Tom, did that. I am not sure if "SQL Commands" is the

Re: [PATCH] Introduce array_shuffle() and array_sample()

2022-07-18 Thread Tom Lane
Martin Kalcher writes: > Is someone interested in looking at it? What are the next steps? The preferred thing to do is to add it to our "commitfest" queue, which will ensure that it gets looked at eventually. The currently open cycle is 2022-09 [1] (see the "New Patch" button there). I believe

[PATCH] Introduce array_shuffle() and array_sample()

2022-07-18 Thread Martin Kalcher
ements from an array. Is someone interested in looking at it? What are the next steps? MartinFrom 5498bb2d9f1fab4cad56cd0d3a6eeafa24a26c8e Mon Sep 17 00:00:00 2001 From: Martin Kalcher Date: Sun, 17 Jul 2022 18:06:04 +0200 Subject: [PATCH] Introduce array_shuffle() and array_sample() * array_s