On Wed, 2 Jun 2021 at 15:23, Tomas Vondra <tomas.von...@enterprisedb.com> wrote: > > > > On 6/2/21 3:03 PM, Matthias van de Meent wrote: > > On Wed, 2 Jun 2021 at 13:57, Tomas Vondra <tomas.von...@enterprisedb.com> > > wrote: > >> > >> Hi, > >> > >> While experimenting with parallel index builds, I've noticed a somewhat > >> strange behavior of pg_stat_progress_create_index when a btree index is > >> built with parallel workers - some of the phases seem to be missing. > >> > >> In serial (no parallelism) mode, the progress is roughly this (it's > >> always the first/last timestamp of each phase): > >> > >> | command | phase > >> -------------+--------------+---------------------------------------- > >> 12:56:01 AM | CREATE INDEX | building index: scanning table > >> ... > >> 01:06:22 AM | CREATE INDEX | building index: scanning table > >> 01:06:23 AM | CREATE INDEX | building index: sorting live tuples > >> ... > >> 01:13:10 AM | CREATE INDEX | building index: sorting live tuples > >> 01:13:11 AM | CREATE INDEX | building index: loading tuples in tree > >> ... > >> 01:24:02 AM | CREATE INDEX | building index: loading tuples in tree > >> > >> So it goes through three phases: > >> > >> 1) scanning tuples > >> 2) sorting live tuples > >> 3) loading tuples in tree > >> > >> But with parallel build index build, it changes to: > >> > >> | command | phase > >> -------------+--------------+---------------------------------------- > >> 11:40:48 AM | CREATE INDEX | building index: scanning table > >> ... > >> 11:47:24 AM | CREATE INDEX | building index: scanning table (scan > >> complete) > >> 11:56:22 AM | CREATE INDEX | building index: scanning table > >> 11:56:23 AM | CREATE INDEX | building index: loading tuples in tree > >> ... > >> 12:05:33 PM | CREATE INDEX | building index: loading tuples in tree > >> > >> That is, the "sorting live tuples" phase disappeared, and instead it > >> seems to be counted in the "scanning table" one, as if there was an > >> update of the phase missing. > > > >> I've only tried this on master, but I assume it behaves like this in the > >> older releases too. I wonder if this is intentional - it sure is a bit > >> misleading. > > > > This was a suprise to me as well. According to documentation in > > sortsupport.h (line 125-129) the parallel workers produce pre-sorted > > segments during the scanning phase, which are subsequently merged by > > the leader. This might mean that the 'sorting' phase is already > > finished during the 'scanning' phase by waiting for the parallel > > workers; I haven't looked further if this is the case and whether it > > could be changed to also produce the sorting metrics, but seeing as it > > is part of the parallel workers API of tuplesort, I think fixing it in > > current releases is going to be difficult. > > > > Maybe. Perhaps it's more complicated to decide when to switch between > phases with parallel workers. Still, the table scan is done after ~8 > minutes (based on blocks_total vs. blocks_done), yet we keep that phase > for another ~9 minutes. It seems this is where the workers do the sort, > so "sorting live tuples" seems like a more natural phase for this.
After looking at it a bit more, it seems like a solution was actually easier than I'd expected. PFA a prototype (unvalidated, but check-world -ed) patch that would add these subphases of progress reporting, which can be backpatched down to 12. Do note that this is a partial fix, as it only fixes it when the leader participates; but I don't think that limitation is too much of a problem because only on builds which explicitly define the non-standard DISABLE_LEADER_PARTICIPATION this will happen, and in such cases the progress reporting for the loading phase will fail as well. With regards, Matthias van de Meent
From 4a7960cf19a6480a18119570a13f19835904f06f Mon Sep 17 00:00:00 2001 From: Matthias van de Meent <boekew...@gmail.com> Date: Wed, 2 Jun 2021 16:07:24 +0200 Subject: [PATCH v1] Report phase progress for the sort phase in parallel btree construction. --- src/backend/access/nbtree/nbtsort.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c index 2c4d7f6e25..92f7ca66d8 100644 --- a/src/backend/access/nbtree/nbtsort.c +++ b/src/backend/access/nbtree/nbtsort.c @@ -1978,9 +1978,18 @@ _bt_parallel_scan_and_sort(BTSpool *btspool, BTSpool *btspool2, * tuplesort_performsort() for spool2 if it ends up containing no dead * tuples (this is disallowed for workers by tuplesort). */ + + if (progress) + pgstat_progress_update_param(PROGRESS_CREATEIDX_SUBPHASE, + PROGRESS_BTREE_PHASE_PERFORMSORT_1); tuplesort_performsort(btspool->sortstate); if (btspool2) + { + if (progress) + pgstat_progress_update_param(PROGRESS_CREATEIDX_SUBPHASE, + PROGRESS_BTREE_PHASE_PERFORMSORT_2); tuplesort_performsort(btspool2->sortstate); + } /* * Done. Record ambuild statistics, and whether we encountered a broken -- 2.20.1