On 2/12/25 15:59, Matthias van de Meent wrote: > On Tue, 7 Jan 2025 at 12:59, Tomas Vondra <to...@vondra.me> wrote: >> >> ... >> >> I haven't done anything about this, but I'm not sure adding the number >> of GIN tuples to pg_stat_progress_create_index would be very useful. We >> don't know the total number of entries, so it can't show the progress. > > For btree scans, we update the number of to-be-inserted tuples > together with the number of blocks scanned. Can we do something > similar with GIN? >
I've been thinking about this, but I'm not quite sure how should that work. The problem is in btree we have a 1:1 mapping to heap tuples, but in GIN that's not quite that simple. Not only do we generate multiple GIN entries for each heap row, but we also combine / merge those tuples in various levels. But I think it might look like this: 1) Each worker counts the number of GinTuples written to the shared tuplesort, after the in-worker merge phase (i.e. it'd not be the number of GIN entries generated in ginBuildCallbackParallel). 2) The leader then counts the number of entries it loaded from the tuplesort, before merging/writing them into the index. I think this would work as a measure of progress, even though it does not really match the number of index tuples. One thing I'm not not sure about is how would this work with the "single tuplesort" patch? That patch moves the merging to the tuplesort code, and there doesn't seem to be a nice way to pass the number of merged outside. > Can we track data for pg_stat_progress_create_index? > Which data? I think progress for the CREATE INDEX would be nice, ofc. regards -- Tomas Vondra