Re: Parallel CREATE INDEX for GIN indexes

2025-05-02 Thread Tomas Vondra
On 4/30/25 14:39, Tomas Vondra wrote: > > On 4/18/25 03:03, Vinod Sridharan wrote: >> ... >> > > The patch seems fine to me - I repeated the tests with mailing list > archives, with MemoryContextStats() in _gin_parallel_merge, and it > reliably minimizes the memory usage. So that's fine. > > I w

Re: Parallel CREATE INDEX for GIN indexes

2025-04-30 Thread Tomas Vondra
On 4/18/25 03:03, Vinod Sridharan wrote: > Hello, > As part of testing this change I believe I found a scenario where the > parallel build seems to trigger OOMs for larger indexes. Specifically, > the calls for ginEntryInsert seem to leak memory into > TopTransactionContext and OOM/crash the out

Re: Parallel CREATE INDEX for GIN indexes

2025-04-21 Thread Tomas Vondra
On 4/18/25 03:03, Vinod Sridharan wrote: > Hello, > As part of testing this change I believe I found a scenario where the > parallel build seems to trigger OOMs for larger indexes. Specifically, > the calls for ginEntryInsert seem to leak memory into > TopTransactionContext and OOM/crash the out

Re: Parallel CREATE INDEX for GIN indexes

2025-04-17 Thread Vinod Sridharan
Hello, As part of testing this change I believe I found a scenario where the parallel build seems to trigger OOMs for larger indexes. Specifically, the calls for ginEntryInsert seem to leak memory into TopTransactionContext and OOM/crash the outer process. For serial build, the calls for ginEntryIn

Re: Parallel CREATE INDEX for GIN indexes

2025-04-09 Thread Kirill Reshke
On Thu, 3 Apr 2025 at 01:19, Tomas Vondra wrote: > > On 4/2/25 18:43, Andres Freund wrote: > > Hi, > > > > On 2025-03-04 20:50:43 +0100, Tomas Vondra wrote: > >> I pushed the two smaller parts today. > >> > >> Here's the remaining two parts, to keep cfbot happy. I don't expect to > >> get these in

Re: Parallel CREATE INDEX for GIN indexes

2025-04-02 Thread Tomas Vondra
On 4/2/25 18:43, Andres Freund wrote: > Hi, > > On 2025-03-04 20:50:43 +0100, Tomas Vondra wrote: >> I pushed the two smaller parts today. >> >> Here's the remaining two parts, to keep cfbot happy. I don't expect to >> get these into PG18, though. > > If that's the case, could we either close the

Re: Parallel CREATE INDEX for GIN indexes

2025-04-02 Thread Andres Freund
Hi, On 2025-03-04 20:50:43 +0100, Tomas Vondra wrote: > I pushed the two smaller parts today. > > Here's the remaining two parts, to keep cfbot happy. I don't expect to > get these into PG18, though. If that's the case, could we either close the CF entry, or move it to the next fest? Greetings,

Re: Parallel CREATE INDEX for GIN indexes

2025-03-15 Thread Matthias van de Meent
On Tue, 4 Mar 2025 at 20:50, Tomas Vondra wrote: > > I pushed the two smaller parts today. > > Here's the remaining two parts, to keep cfbot happy. I don't expect to > get these into PG18, though. As promised on- and off-list, here's the 0001 patch, polished, split, and further adapted for perfor

Re: Parallel CREATE INDEX for GIN indexes

2025-03-15 Thread Alexander Korotkov
Hi Matthias, On Fri, Mar 7, 2025 at 4:08 AM Matthias van de Meent wrote: > > On Tue, 4 Mar 2025 at 20:50, Tomas Vondra wrote: > > > > I pushed the two smaller parts today. > > > > Here's the remaining two parts, to keep cfbot happy. I don't expect to > > get these into PG18, though. > > As promi

Re: Parallel CREATE INDEX for GIN indexes

2025-03-09 Thread Peter Geoghegan
On Sun, Mar 9, 2025 at 6:23 PM Tom Lane wrote: > Ah. Most likely somebody dismissed it years ago. Given that > precedent, I'm content to dismiss this one too. It is dead code, unless somebody decides to #define DISABLE_LEADER_PARTICIPATION to debug a problem. -- Peter Geoghegan

Re: Parallel CREATE INDEX for GIN indexes

2025-03-09 Thread Tom Lane
Tomas Vondra writes: > On 3/9/25 17:38, Tom Lane wrote: >> Coverity is a little unhappy about this business in >> _gin_begin_parallel: > I don't mind doing it differently, but this code is just a copy from > _bt_begin_parallel. So how come coverity does not complain about that? > Or is that white

Re: Parallel CREATE INDEX for GIN indexes

2025-03-09 Thread Tomas Vondra
On 3/9/25 17:38, Tom Lane wrote: > Tomas Vondra writes: >> I pushed the two smaller parts today. > > Coverity is a little unhappy about this business in > _gin_begin_parallel: > > boolleaderparticipates = true; > ... > #ifdef DISABLE_LEADER_PARTICIPATION > leader

Re: Parallel CREATE INDEX for GIN indexes

2025-03-09 Thread Tom Lane
Tomas Vondra writes: > I pushed the two smaller parts today. Coverity is a little unhappy about this business in _gin_begin_parallel: boolleaderparticipates = true; ... #ifdef DISABLE_LEADER_PARTICIPATION leaderparticipates = false; #endif ...

Re: Parallel CREATE INDEX for GIN indexes

2025-03-04 Thread Tomas Vondra
I pushed the two smaller parts today. Here's the remaining two parts, to keep cfbot happy. I don't expect to get these into PG18, though. regards -- Tomas Vondra From bea52f76255830af45b7122b0fa5786997182cf5 Mon Sep 17 00:00:00 2001 From: Tomas Vondra Date: Tue, 25 Feb 2025 16:12:37 +0100 Sub

Re: Parallel CREATE INDEX for GIN indexes

2025-03-03 Thread Tomas Vondra
I've pushed the first part of the series (0001 + the cleanup and progress patch). That leaves the two smaller improvement parts (compression + memory limit enforcement) - I intend to push those sometime this week, if possible. Here's a rebased version of the whole patch series, including the two W

Re: Parallel CREATE INDEX for GIN indexes

2025-02-26 Thread Tomas Vondra
While working on the progress reporting, I've been looking into the performance results, particularly why the parallelism doesn't help much for some indexes - e.g. the index on the headers JSONB column. CREATE INDEX headers_jsonb_idx ON messages USING gin (msg_headers); In this case the paral

Re: Parallel CREATE INDEX for GIN indexes

2025-02-17 Thread Matthias van de Meent
On Sun, 16 Feb 2025 at 04:47, Tomas Vondra wrote: > > Hi, > > Attached is a cleaned up version of the patch series, squashed into > fewer patches as discussed. I also went through all the comments, and > removed/updated some obsolete ones. I also updated the commit messages, > it'd be nice if some

Re: Parallel CREATE INDEX for GIN indexes

2025-02-15 Thread Tomas Vondra
On 2/12/25 15:59, Matthias van de Meent wrote: > On Tue, 7 Jan 2025 at 12:59, Tomas Vondra wrote: >> >> ... >> >> I haven't done anything about this, but I'm not sure adding the number >> of GIN tuples to pg_stat_progress_create_index would be very useful. We >> don't know the total number of entr

Re: Parallel CREATE INDEX for GIN indexes

2025-02-12 Thread Tomas Vondra
On 2/12/25 15:59, Matthias van de Meent wrote: > On Tue, 7 Jan 2025 at 12:59, Tomas Vondra wrote: >> >> On 1/6/25 20:13, Matthias van de Meent wrote: >>> ... Thanks. Attached is a rebased patch series fixing those issues, and one issue I found in an AssertCheckGinBuffer, which was c

Re: Parallel CREATE INDEX for GIN indexes

2025-02-12 Thread Matthias van de Meent
On Tue, 7 Jan 2025 at 12:59, Tomas Vondra wrote: > > On 1/6/25 20:13, Matthias van de Meent wrote: >> ... >>> >>> Thanks. Attached is a rebased patch series fixing those issues, and one >>> issue I found in an AssertCheckGinBuffer, which was calling the other >>> assert (AssertCheckItemPointers) e

Re: Parallel CREATE INDEX for GIN indexes

2025-01-06 Thread Matthias van de Meent
On Sat, 4 Jan 2025 at 17:58, Tomas Vondra wrote: > > On 11/24/24 19:04, Kirill Reshke wrote: > > On Tue, 8 Oct 2024 at 17:06, Tomas Vondra wrote: > >> > >> On 10/8/24 04:03, Michael Paquier wrote: > >>> > >>> _gin_parallel_build_main() is introduced in 0001. Please make sure to > >>> pass down a

Re: Parallel CREATE INDEX for GIN indexes

2024-11-24 Thread Kirill Reshke
On Tue, 8 Oct 2024 at 17:06, Tomas Vondra wrote: > > On 10/8/24 04:03, Michael Paquier wrote: > > > > _gin_parallel_build_main() is introduced in 0001. Please make sure to > > pass down a query ID. > > Thanks for the ping. Here's an updated patch doing that, and also fixing > a couple whitespace

Re: Parallel CREATE INDEX for GIN indexes

2024-10-07 Thread Michael Paquier
On Fri, Jul 12, 2024 at 05:34:25PM +0200, Tomas Vondra wrote: > I got to do the detailed benchmarking on the latest version of the patch > series, so here's the results. My goal was to better understand the > impact of each patch individually - especially the two parts introduced > by Matthias, but

Re: Parallel CREATE INDEX for GIN indexes

2024-08-28 Thread Andy Fan
Tomas Vondra writes: Hi Tomas, > Yeah. I think we have agreement on 0001-0007. Yes, the design of 0001-0007 looks good to me and because of the existing compexitity, I want to foucs on this part for now. I am doing code review from yesterday, and now my work is done. Just some small questions:

Re: Parallel CREATE INDEX for GIN indexes

2024-08-27 Thread Matthias van de Meent
On Wed, 28 Aug 2024 at 02:38, Andy Fan wrote: > > Matthias van de Meent writes: > > tuplesort_performsort usually only needs to flush the last elements of > > ... In certain rare > > cases it may take a longer time as it may have to merge sorted runs, > > but those cases are quite rare in my expe

Re: Parallel CREATE INDEX for GIN indexes

2024-08-27 Thread Andy Fan
Matthias van de Meent writes: > tuplesort_performsort usually only needs to flush the last elements of > ... In certain rare > cases it may take a longer time as it may have to merge sorted runs, > but those cases are quite rare in my experience. OK, I am expecting such cases are not rare, Supp

Re: Parallel CREATE INDEX for GIN indexes

2024-08-27 Thread Matthias van de Meent
On Tue, 27 Aug 2024 at 12:15, Andy Fan wrote: > > Tomas Vondra writes: > > And let's talk about the improvement by Matthias, namely: > > > > * 0008 Use a single GIN tuplesort > > * 0009 Reduce the size of GinTuple by 12 bytes > > > > I haven't really seen any impact on duration - it seems more or

Re: Parallel CREATE INDEX for GIN indexes

2024-08-27 Thread Tomas Vondra
On 8/27/24 12:14, Andy Fan wrote: > Tomas Vondra writes: > >> Hi, >> >> I got to do the detailed benchmarking on the latest version of the patch >> series, so here's the results. My goal was to better understand the >> impact of each patch individually - especially the two parts introduced >>

Re: Parallel CREATE INDEX for GIN indexes

2024-08-27 Thread Andy Fan
Tomas Vondra writes: > Hi, > > I got to do the detailed benchmarking on the latest version of the patch > series, so here's the results. My goal was to better understand the > impact of each patch individually - especially the two parts introduced > by Matthias, but not only - so I ran the test o

Re: Parallel CREATE INDEX for GIN indexes

2024-08-02 Thread Matthias van de Meent
On Tue, 9 Jul 2024 at 03:18, Andy Fan wrote: >> and later we called 'tuplesort_performsort(state->bs_sortstate);'. Even >> we have some CTID merges activity in '(1)', the tuples are still >> ordered, so the sort (in both tuplesort_putgintuple and >> 'tuplesort_performsort) are not necessary,

Re: Parallel CREATE INDEX for GIN indexes

2024-07-08 Thread Andy Fan
Andy Fan writes: I just realize all my replies is replied to sender only recently, probably because I upgraded the email cient and the short-cut changed sliently, resent the lastest one only >>> Suppose RBTree's output is: >>> >>> batch-1 at RBTree: >>> 1 [tid1, tid8, tid100] >>> 2 [tid1

Re: Parallel CREATE INDEX for GIN indexes

2024-07-08 Thread Matthias van de Meent
On Mon, 8 Jul 2024, 13:38 Tomas Vondra, wrote: > > On 7/8/24 11:45, Matthias van de Meent wrote: > > As to the GIN TS code itself; yes it's more complicated, mainly > > because it uses several optimizations to reduce unnecessary > > allocations and (de)serializations of GinTuples, and I'm aware of

Re: Parallel CREATE INDEX for GIN indexes

2024-07-08 Thread Tomas Vondra
On 7/8/24 11:45, Matthias van de Meent wrote: > On Sun, 7 Jul 2024, 18:26 Tomas Vondra, wrote: >> >> On 7/5/24 21:45, Matthias van de Meent wrote: >>> On Wed, 3 Jul 2024 at 20:36, Matthias van de Meent >>> wrote: On Mon, 24 Jun 2024 at 02:58, Tomas Vondra wrote: > So I've s

Re: Parallel CREATE INDEX for GIN indexes

2024-07-08 Thread Matthias van de Meent
On Sun, 7 Jul 2024, 18:26 Tomas Vondra, wrote: > > On 7/5/24 21:45, Matthias van de Meent wrote: >> On Wed, 3 Jul 2024 at 20:36, Matthias van de Meent >> wrote: >>> >>> On Mon, 24 Jun 2024 at 02:58, Tomas Vondra >>> wrote: So I've switched this to use the regular data-type comparisons, with

Re: Parallel CREATE INDEX for GIN indexes

2024-07-08 Thread Matthias van de Meent
On Sun, 7 Jul 2024, 18:11 Tomas Vondra, wrote: > > On 7/3/24 20:36, Matthias van de Meent wrote: >> On Mon, 24 Jun 2024 at 02:58, Tomas Vondra >> wrote: >>> So I've switched this to use the regular data-type comparisons, with >>> SortSupport etc. There's a bit more cleanup remaining and testing >

Re: Parallel CREATE INDEX for GIN indexes

2024-07-07 Thread Tomas Vondra
On 7/5/24 21:45, Matthias van de Meent wrote: > On Wed, 3 Jul 2024 at 20:36, Matthias van de Meent > wrote: >> >> On Mon, 24 Jun 2024 at 02:58, Tomas Vondra >> wrote: >>> So I've switched this to use the regular data-type comparisons, with >>> SortSupport etc. There's a bit more cleanup remaining

Re: Parallel CREATE INDEX for GIN indexes

2024-07-07 Thread Tomas Vondra
On 7/3/24 20:36, Matthias van de Meent wrote: > On Mon, 24 Jun 2024 at 02:58, Tomas Vondra > wrote: >> >> Here's a bit more cleaned up version, clarifying a lot of comments, >> removing a bunch of obsolete comments, or comments speculating about >> possible solutions, that sort of thing. I've a

Re: Parallel CREATE INDEX for GIN indexes

2024-07-03 Thread Matthias van de Meent
On Mon, 24 Jun 2024 at 02:58, Tomas Vondra wrote: > > Here's a bit more cleaned up version, clarifying a lot of comments, > removing a bunch of obsolete comments, or comments speculating about > possible solutions, that sort of thing. I've also removed couple more > XXX comments, etc. > > The main

Re: Parallel CREATE INDEX for GIN indexes

2024-07-02 Thread Tomas Vondra
On 7/2/24 02:07, Andy Fan wrote: > Tomas Vondra writes: > > Hi Tomas, > > I am in a incompleted review process but I probably doesn't have much > time on this because of my internal tasks. So I just shared what I > did and the non-good-result patch. > > What I tried to do is: > 1) remove all th

Re: Parallel CREATE INDEX for GIN indexes

2024-07-01 Thread Andy Fan
Tomas Vondra writes: Hi Tomas, I am in a incompleted review process but I probably doesn't have much time on this because of my internal tasks. So I just shared what I did and the non-good-result patch. What I tried to do is: 1) remove all the "sort" effort for the state->bs_sort_state tuples s

Re: Parallel CREATE INDEX for GIN indexes

2024-05-28 Thread Andy Fan
Hi Tomas, I have completed my first round of review, generally it looks good to me, more testing need to be done in the next days. Here are some tiny comments from my side, just FYI. 1. Comments about GinBuildState.bs_leader looks not good for me. /* * bs_leader is only p

Re: Parallel CREATE INDEX for GIN indexes

2024-05-13 Thread Tomas Vondra
On 5/13/24 10:19, Andy Fan wrote: > > Tomas Vondra writes: > >> ... >> >> I don't understand the question. The blocks are distributed to workers >> by the parallel table scan, and it certainly does not do that block by >> block. But even it it did, that's not a problem for this code. > > OK, I

Re: Parallel CREATE INDEX for GIN indexes

2024-05-13 Thread Andy Fan
Tomas Vondra writes: >>> 7) v20240502-0007-Detect-wrap-around-in-parallel-callback.patch >>> >>> There's one more efficiency problem - the parallel scans are required to >>> be synchronized, i.e. the scan may start half-way through the table, and >>> then wrap around. Which however means the TI

Re: Parallel CREATE INDEX for GIN indexes

2024-05-10 Thread Tomas Vondra
On 5/10/24 07:53, Andy Fan wrote: > > Tomas Vondra writes: > >>> I guess both of you are talking about worker process, if here are >>> something in my mind: >>> >>> *btbuild* also let the WORKER dump the tuples into Sharedsort struct >>> and let the LEADER merge them directly. I think this aim o

Re: Parallel CREATE INDEX for GIN indexes

2024-05-10 Thread Andy Fan
Tomas Vondra writes: >> I guess both of you are talking about worker process, if here are >> something in my mind: >> >> *btbuild* also let the WORKER dump the tuples into Sharedsort struct >> and let the LEADER merge them directly. I think this aim of this design >> is it is potential to save

Re: Parallel CREATE INDEX for GIN indexes

2024-05-09 Thread Tomas Vondra
On 5/9/24 17:51, Matthias van de Meent wrote: > On Thu, 9 May 2024 at 15:13, Tomas Vondra > wrote: >> Let me explain the relevant part of the patch, and how I understand the >> improvement suggested by Matthias. The patch does the work in three phases: >> >> >> 1) Worker gets data from table,

Re: Parallel CREATE INDEX for GIN indexes

2024-05-09 Thread Matthias van de Meent
On Thu, 9 May 2024 at 15:13, Tomas Vondra wrote: > Let me explain the relevant part of the patch, and how I understand the > improvement suggested by Matthias. The patch does the work in three phases: > > > 1) Worker gets data from table, split that into index items and add > those into a "private

Re: Parallel CREATE INDEX for GIN indexes

2024-05-09 Thread Tomas Vondra
On 5/2/24 20:22, Tomas Vondra wrote: >> >>> For some of the opclasses it can regress (like the jsonb_path_ops). I >>> don't think that's a major issue. Or more precisely, I'm not surprised >>> by it. It'd be nice to be able to disable the parallel builds in these >>> cases somehow, but I haven't t

Re: Parallel CREATE INDEX for GIN indexes

2024-05-09 Thread Tomas Vondra
On 5/9/24 11:44, Andy Fan wrote: > > Hello Tomas, > 2) v20240502-0002-Use-mergesort-in-the-leader-process.patch The approach implemented by 0001 works, but there's a little bit of issue - if there are many distinct keys (e.g. for trigrams that can happen very easily),

Re: Parallel CREATE INDEX for GIN indexes

2024-05-09 Thread Tomas Vondra
On 5/9/24 12:14, Andy Fan wrote: > > Tomas Vondra writes: > >> 3) v20240502-0003-Remove-the-explicit-pg_qsort-in-workers.patch >> >> In 0002 the workers still do an explicit qsort() on the TID list before >> writing the data into the shared tuplesort. But we can do better - the >> workers can do

Re: Parallel CREATE INDEX for GIN indexes

2024-05-09 Thread Andy Fan
Tomas Vondra writes: > 3) v20240502-0003-Remove-the-explicit-pg_qsort-in-workers.patch > > In 0002 the workers still do an explicit qsort() on the TID list before > writing the data into the shared tuplesort. But we can do better - the > workers can do a merge sort too. To help with this, we ad

Re: Parallel CREATE INDEX for GIN indexes

2024-05-09 Thread Andy Fan
Hello Tomas, >>> 2) v20240502-0002-Use-mergesort-in-the-leader-process.patch >>> >>> The approach implemented by 0001 works, but there's a little bit of >>> issue - if there are many distinct keys (e.g. for trigrams that can >>> happen very easily), the workers will hit the memory limit with onl

Re: Parallel CREATE INDEX for GIN indexes

2024-05-02 Thread Tomas Vondra
On 5/2/24 19:12, Matthias van de Meent wrote: > On Thu, 2 May 2024 at 17:19, Tomas Vondra > wrote: >> >> Hi, >> >> In PG17 we shall have parallel CREATE INDEX for BRIN indexes, and back >> when working on that I was thinking how difficult would it be to do >> something similar to do that for o

Re: Parallel CREATE INDEX for GIN indexes

2024-05-02 Thread Matthias van de Meent
On Thu, 2 May 2024 at 17:19, Tomas Vondra wrote: > > Hi, > > In PG17 we shall have parallel CREATE INDEX for BRIN indexes, and back > when working on that I was thinking how difficult would it be to do > something similar to do that for other index types, like GIN. I even had > that on my list of