On Tue, 7 Jan 2025 at 12:59, Tomas Vondra <to...@vondra.me> wrote: > > On 1/6/25 20:13, Matthias van de Meent wrote: >> ... >>> >>> Thanks. Attached is a rebased patch series fixing those issues, and one >>> issue I found in an AssertCheckGinBuffer, which was calling the other >>> assert (AssertCheckItemPointers) even for empty buffers. I think this >>> part might need some more work, so that it's clear what the various >>> asserts assume (or rather to allow just calling AssertCheckGinBuffer >>> everywhere, with some flags). >> >> Thanks for the rebase. >> >>> 0001 >>> + * mutex protects all fields before heapdesc. >> >> This comment is still inaccurate. >> > > Hmm, yeah. But this comment originates from btree, so maybe it's wrong > there (and in BRIN too)? I believe it refers to the descriptors stored > after the struct, i.e. it means "all fields after the mutex".
Yeah, I think that's just the comment that needs updating. >>> + /* FIXME likely duplicate with indtuples */ >> >> I think this doesn't have to be duplicate, as we can distinguish >> between number of heap tuples and the number of GIN (key, TID) pairs >> loaded. This distinction doesn't really exist anywhere else, though, >> so to expose this to users we may need changes in >> pg_stat_progress_create_index. >> >> While I haven't checked if that distinction is being made in the code, >> I think it would be a useful distinction to have. >> > > I haven't done anything about this, but I'm not sure adding the number > of GIN tuples to pg_stat_progress_create_index would be very useful. We > don't know the total number of entries, so it can't show the progress. For btree scans, we update the number of to-be-inserted tuples together with the number of blocks scanned. Can we do something similar with GIN? Can we track data for pg_stat_progress_create_index? >>> GinBufferInit >> >> This seems to depend on the btree operator classes to get sortsupport >> functions, bypassing the GIN compare support function (support >> function 1) and adding a dependency on the btree opclasses for >> indexable types. This can cause "bad" ordering, or failure to build >> the index when the parallel path is chosen and no default btree >> opclass is defined for the type. I think it'd be better if we allowed >> users to specify which sortsupport function to use, or at least use >> the correct compare function when it's defined on the attribute's >> operator class. >> > > Good point! I fixed this by copying the logic from initGinState. > >>> include/access/gin_tuple.h >>> + OffsetNumber attrnum; /* attnum of index key */ >> >> I think this would best be AttrNumber-typed? Looks like I didn't >> notice or fix that in 0009. >> > > You're probably right, but I see the GIN code uses OffsetNumber for > attrnum in a number of places. I wonder why is that. I don't think it > can be harmful, because we can't have GIN on system columns, right? Indeed, indexes on system columns are not supported, which includes GIN indexes. >>> I need to figure out how to squash the patches - I don't want to >>> squash this into a single much-harder-to-understand commit, but maybe it >>> has too many parts. I think the following would be good: Commits: 1.) 0001 (parallel create) + 0009 (reduce the size of ...) + 0002 (mergesort) + 0003 (remove explicit pg_qsort) + 0007 (detect wrap-around) 2.) 0004 (compress) + 0006 (enforce memory limit) 3.) 0008 (single tuplesort) Note that 0009 is a drop-in improvement, so I don't think order makes much of a difference there. IIUC, 0005 was only for development insights, and not proposed to get committed. If that was wrong, I'd squash it into the second commit, together with 0004/0006. I'll try to provide a more polished version of 0008 soon, with improved comments/commit message, however that'll depend on me not getting distracted with $job items first; it's taken quite some time recently. Kind regards, Matthias van de Meent