Re: Parallel grouping sets

2020-07-12 Thread Daniel Gustafsson
> On 25 Mar 2020, at 15:35, Pengzhou Tang wrote: > Thanks a lot, the patch has a memory leak in the lookup_hash_entries, it uses > a list_concat there > and causes a 64-byte leak for every tuple, has fixed that. > > Also, resolved conflicts and rebased the code. While there hasn't been a revie

Re: Parallel grouping sets

2020-03-23 Thread Tomas Vondra
On Fri, Mar 20, 2020 at 07:57:02PM +0800, Pengzhou Tang wrote: Hi Tomas, I rebased the code and resolved the comments you attached, some unresolved comments are explained in 0002-fixes.patch, please take a look. I also make the hash spill working for parallel grouping sets, the plan looks like:

Re: Parallel grouping sets

2020-03-19 Thread Pengzhou Tang
Thanks you to review this patch. On Thu, Mar 19, 2020 at 10:09 AM Tomas Vondra wrote: > Hi, > > unfortunately this got a bit broken by the disk-based hash aggregation, > committed today, and so it needs a rebase. I've started looking at the > patch before that, and I have it rebased on e00912e11

Re: Parallel grouping sets

2020-02-24 Thread Richard Guo
To summarize the current state of parallel grouping sets, we now have two available implementations for it. 1) Each worker performs an aggregation step, producing a partial result for each group of which that process is aware. Then the partial results are gathered to the leader, which then perform

Re: Parallel grouping sets

2020-02-09 Thread Pengzhou Tang
Thanks to reviewing those patches. Ha, I believe you meant to say a "normal aggregate", because what's > performed above gather is no longer "grouping sets", right? > > The group key idea is clever in that it helps "discriminate" tuples by > their grouping set id. I haven't completely thought this

Re: Parallel grouping sets

2020-02-03 Thread Jesse Zhang
On Mon, Feb 3, 2020 at 12:07 AM Richard Guo wrote: > > Hi Jesse, > > Thanks for reviewing these two patches. I enjoyed it! > > On Sat, Jan 25, 2020 at 6:52 AM Jesse Zhang wrote: >> >> >> I glanced over both patches. Just the opposite, I have a hunch that v3 >> is always better than v5. Here's my

Re: Parallel grouping sets

2020-02-03 Thread Richard Guo
Hi Amit, Thanks for reviewing these two patches. On Sat, Jan 25, 2020 at 6:31 PM Amit Kapila wrote: > > This is what I also understood after reading this thread. So, my > question is why not just review v3 and commit something on those lines > even though it would take a bit more time. It is

Re: Parallel grouping sets

2020-02-03 Thread Richard Guo
Hi Jesse, Thanks for reviewing these two patches. On Sat, Jan 25, 2020 at 6:52 AM Jesse Zhang wrote: > > I glanced over both patches. Just the opposite, I have a hunch that v3 > is always better than v5. Here's my 6-minute understanding of both. > > v5 (the one with a simple partial aggregate)

Re: Parallel grouping sets

2020-01-25 Thread Amit Kapila
On Sat, Jan 25, 2020 at 4:22 AM Jesse Zhang wrote: > > On Thu, Jan 23, 2020 at 2:47 AM Amit Kapila wrote: > > > > On Sun, Jan 19, 2020 at 2:23 PM Richard Guo wrote: > > > > > > I realized that there are two patches in this thread that are > > > implemented according to different methods, which c

Re: Parallel grouping sets

2020-01-24 Thread Jesse Zhang
On Thu, Jan 23, 2020 at 2:47 AM Amit Kapila wrote: > > On Sun, Jan 19, 2020 at 2:23 PM Richard Guo wrote: > > > > I realized that there are two patches in this thread that are > > implemented according to different methods, which causes confusion. > > > > Both the idea seems to be different. Is

Re: Parallel grouping sets

2020-01-23 Thread Amit Kapila
On Sun, Jan 19, 2020 at 2:23 PM Richard Guo wrote: > > I realized that there are two patches in this thread that are > implemented according to different methods, which causes confusion. > Both the idea seems to be different. Is the second approach [1] inferior for any case as compared to the fi

Re: Parallel grouping sets

2020-01-19 Thread Richard Guo
I realized that there are two patches in this thread that are implemented according to different methods, which causes confusion. So I decide to update this thread with only one patch, i.e. the patch for 'Implementation 1' as described in the first email and then move the other patch to a separate

Re: Parallel grouping sets

2020-01-07 Thread Richard Guo
On Sun, Dec 1, 2019 at 10:03 AM Michael Paquier wrote: > On Thu, Nov 28, 2019 at 07:07:22PM +0800, Pengzhou Tang wrote: > > Richard pointed out that he get incorrect results with the patch I > > attached, there are bugs somewhere, > > I fixed them now and attached the newest version, please refer

Re: Parallel grouping sets

2019-11-30 Thread Michael Paquier
On Thu, Nov 28, 2019 at 07:07:22PM +0800, Pengzhou Tang wrote: > Richard pointed out that he get incorrect results with the patch I > attached, there are bugs somewhere, > I fixed them now and attached the newest version, please refer to [1] for > the fix. Mr Robot is reporting that the latest pat

Re: Parallel grouping sets

2019-11-28 Thread Pengzhou Tang
Hi Hackers, Richard pointed out that he get incorrect results with the patch I attached, there are bugs somewhere, I fixed them now and attached the newest version, please refer to [1] for the fix. Thanks, Pengzhou References: [1] https://github.com/greenplum-db/postgres/tree/parallel_groupingse

Re: Parallel grouping sets

2019-09-30 Thread Pengzhou Tang
Hi Richard & Tomas: I followed the idea of the second approach to add a gset_id in the targetlist of the first stage of grouping sets and uses it to combine the aggregate in final stage. gset_id stuff is still kept because of GROUPING() cannot uniquely identify a grouping set, grouping sets may co

Re: Parallel grouping sets

2019-07-31 Thread Richard Guo
On Tue, Jul 30, 2019 at 11:05 PM Tomas Vondra wrote: > On Tue, Jul 30, 2019 at 03:50:32PM +0800, Richard Guo wrote: > >On Wed, Jun 12, 2019 at 10:58 AM Richard Guo wrote: > > > >> Hi all, > >> > >> Paul and I have been hacking recently to implement parallel grouping > >> sets, and here we have t

Re: Parallel grouping sets

2019-07-30 Thread Tomas Vondra
On Tue, Jul 30, 2019 at 03:50:32PM +0800, Richard Guo wrote: On Wed, Jun 12, 2019 at 10:58 AM Richard Guo wrote: Hi all, Paul and I have been hacking recently to implement parallel grouping sets, and here we have two implementations. Implementation 1 Attached is the patch a

Re: Parallel grouping sets

2019-07-30 Thread Richard Guo
On Wed, Jun 12, 2019 at 10:58 AM Richard Guo wrote: > Hi all, > > Paul and I have been hacking recently to implement parallel grouping > sets, and here we have two implementations. > > Implementation 1 > > > Attached is the patch and also there is a github branch [1] for this > w

Re: Parallel grouping sets

2019-06-13 Thread Tomas Vondra
On Fri, Jun 14, 2019 at 12:02:52PM +1200, David Rowley wrote: On Fri, 14 Jun 2019 at 11:45, Tomas Vondra wrote: On Wed, Jun 12, 2019 at 10:58:44AM +0800, Richard Guo wrote: ># explain (costs off, verbose) select c1, c2, avg(c3) from t2 group by >grouping sets((c1,c2), (c1)); >

Re: Parallel grouping sets

2019-06-13 Thread David Rowley
On Fri, 14 Jun 2019 at 11:45, Tomas Vondra wrote: > > On Wed, Jun 12, 2019 at 10:58:44AM +0800, Richard Guo wrote: > ># explain (costs off, verbose) select c1, c2, avg(c3) from t2 group by > >grouping sets((c1,c2), (c1)); > > QUERY PLAN > >

Re: Parallel grouping sets

2019-06-13 Thread Tomas Vondra
On Wed, Jun 12, 2019 at 10:58:44AM +0800, Richard Guo wrote: Hi all, Paul and I have been hacking recently to implement parallel grouping sets, and here we have two implementations. Implementation 1 Attached is the patch and also there is a github branch [1] for this work. Pa

Re: Parallel grouping sets

2019-06-13 Thread Richard Guo
On Thu, Jun 13, 2019 at 12:29 PM David Rowley wrote: > On Wed, 12 Jun 2019 at 14:59, Richard Guo wrote: > > Implementation 1 > > > Parallel aggregation has already been supported in PostgreSQL and it is > > implemented by aggregating in two stages. First, each worker performs an > > aggregation

Re: Parallel grouping sets

2019-06-12 Thread David Rowley
On Wed, 12 Jun 2019 at 14:59, Richard Guo wrote: > Implementation 1 > Parallel aggregation has already been supported in PostgreSQL and it is > implemented by aggregating in two stages. First, each worker performs an > aggregation step, producing a partial result for each group of which > that pr