Re: Parallel CREATE INDEX for GIN indexes

2025-04-17 Thread Vinod Sridharan
Hello,
As part of testing this change I believe I found a scenario where the
parallel build seems to trigger OOMs for larger indexes. Specifically,
the calls for ginEntryInsert seem to leak memory into
TopTransactionContext and OOM/crash the outer process.
For serial build, the calls for ginEntryInsert tend to happen in a
temporary memory context that gets reset at the end of the
ginBuildCallback.
For inserts, the call has a custom memory context and gets reset at
the end of the insert.
For parallel build, during the merge phase, the MemoryContext isn't
swapped - and so this happens on the TopTransactionContext, and ends
up growing (especially for larger indexes).

I believe at the very least these should happen inside the tmpCtx
found in the GinBuildState and reset periodically.

In the attached patch, I've tried to do this, and I'm able to build
the index without OOMing, and only consuming maintenance_work_mem
through the merge process.

Would appreciate your thoughts on this (and whether there's other approaches to
resolve this too).

Thanks,
Vinod

On Thu, 17 Apr 2025 at 13:58, Tomas Vondra
 wrote:
>
>
>
> On 7/3/24 20:36, Matthias van de Meent wrote:
> > On Mon, 24 Jun 2024 at 02:58, Tomas Vondra
> >  wrote:
> >>
> >> Here's a bit more cleaned up version, clarifying a lot of comments,
> >> removing a bunch of obsolete comments, or comments speculating about
> >> possible solutions, that sort of thing. I've also removed couple more
> >> XXX comments, etc.
> >>
> >> The main change however is that the sorting no longer relies on memcmp()
> >> to compare the values. I did that because it was enough for the initial
> >> WIP patches, and it worked till now - but the comments explained this
> >> may not be a good idea if the data type allows the same value to have
> >> multiple binary representations, or something like that.
> >>
> >> I don't have a practical example to show an issue, but I guess if using
> >> memcmp() was safe we'd be doing it in a bunch of places already, and
> >> AFAIK we're not. And even if it happened to be OK, this is a probably
> >> not the place where to start doing it.
> >
> > I think one such example would be the values '5.00'::jsonb and
> > '5'::jsonb when indexed using GIN's jsonb_ops, though I'm not sure if
> > they're treated as having the same value inside the opclass' ordering.
> >
>
> Yeah, possibly. But doing the comparison the "proper" way seems to be
> working pretty well, so I don't plan to investigate this further.
>
> >> So I've switched this to use the regular data-type comparisons, with
> >> SortSupport etc. There's a bit more cleanup remaining and testing
> >> needed, but I'm not aware of any bugs.
> >
> > A review of patch 0001:
> >
> > ---
> >
> >> src/backend/access/gin/gininsert.c | 1449 +++-
> >
> > The nbtree code has `nbtsort.c` for its sort- and (parallel) build
> > state handling, which is exclusively used during index creation. As
> > the changes here seem to be largely related to bulk insertion, how
> > much effort would it be to split the bulk insertion code path into a
> > separate file?
> >
>
> Hmmm. I haven't tried doing that, but I guess it's doable. I assume we'd
> want to do the move first, because it involves pre-existing code, and
> then do the patch on top of that.
>
> But what would be the benefit of doing that? I'm not sure doing it just
> to make it look more like btree code is really worth it. Do you expect
> the result to be clearer?
>
> > I noticed that new fields in GinBuildState do get to have a
> > bs_*-prefix, but none of the other added or previous fields of the
> > modified structs in gininsert.c have such prefixes. Could this be
> > unified?
> >
>
> Yeah, these are inconsistencies from copying the infrastructure code to
> make the parallel builds work, etc.
>
> >> +/* Magic numbers for parallel state sharing */
> >> +#define PARALLEL_KEY_GIN_SHAREDUINT64CONST(0xB001)
> >> ...
> >
> > These overlap with BRIN's keys; can we make them unique while we're at it?
> >
>
> We could, and I recall we had a similar discussion in the parallel BRIN
> thread, right?. But I'm somewhat unsure why would we actually want/need
> these keys to be unique. Surely, we don't need to mix those keys in the
> single shm segment, right? So it seems more like an aesthetic thing. Or
> is there some policy to have unique values for these keys?
>
> >> +  * mutex protects all fields before heapdesc.
> >
> > I can't find the field that this `heapdesc` might refer to.
> >
>
> Yeah, likely a leftover from copying a bunch of code and then removing
> it without updating the comment. Will fix.
>
> >> +_gin_begin_parallel(GinBuildState *buildstate, Relation heap, Relation 
> >> index,
> >> ...
> >> + if (!isconcurrent)
> >> +snapshot = SnapshotAny;
> >> +else
> >> +snapshot = RegisterSnapshot(GetTransactionSnapshot());
> >
> > grumble: I know this is required from the index with the current APIs,

Small patch fixing a query correctness issue in Gin with operator classes implementing Consistent functions

2025-04-11 Thread Vinod Sridharan
Hi All,

Please find a small patch to fix an existing bug in the GIN index that
impacts correctness of query results for operator classes that use a
consistentFunction and do not specify a triConsistent function. This
patch is against the master and fixes an issue not in any release
branches.

Please find the thread discussing this issue and the fix here:
https://www.postgresql.org/message-id/flat/CAFMdLD4Ks5b%3DCbBh1PjFSytm0zdNv9-ddyeE%2BopusAKCVph7%3Dg%40mail.gmail.com

--
Thanks and Regards,
Vinod Sridharan
[Microsoft]


v1-0001-Fix-shimTriConsistentFn-mutating-the-entryRes-val.patch
Description: Binary data