On 6/2/21 3:03 PM, Matthias van de Meent wrote:
On Wed, 2 Jun 2021 at 13:57, Tomas Vondra <tomas.von...@enterprisedb.com> wrote:
Hi,
While experimenting with parallel index builds, I've noticed a somewhat
strange behavior of pg_stat_progress_create_index when a btree index is
built with parallel workers - some of the phases seem to be missing.
In serial (no parallelism) mode, the progress is roughly this (it's
always the first/last timestamp of each phase):
| command | phase
-------------+--------------+----------------------------------------
12:56:01 AM | CREATE INDEX | building index: scanning table
...
01:06:22 AM | CREATE INDEX | building index: scanning table
01:06:23 AM | CREATE INDEX | building index: sorting live tuples
...
01:13:10 AM | CREATE INDEX | building index: sorting live tuples
01:13:11 AM | CREATE INDEX | building index: loading tuples in tree
...
01:24:02 AM | CREATE INDEX | building index: loading tuples in tree
So it goes through three phases:
1) scanning tuples
2) sorting live tuples
3) loading tuples in tree
But with parallel build index build, it changes to:
| command | phase
-------------+--------------+----------------------------------------
11:40:48 AM | CREATE INDEX | building index: scanning table
...
11:47:24 AM | CREATE INDEX | building index: scanning table (scan
complete)
11:56:22 AM | CREATE INDEX | building index: scanning table
11:56:23 AM | CREATE INDEX | building index: loading tuples in tree
...
12:05:33 PM | CREATE INDEX | building index: loading tuples in tree
That is, the "sorting live tuples" phase disappeared, and instead it
seems to be counted in the "scanning table" one, as if there was an
update of the phase missing.
I've only tried this on master, but I assume it behaves like this in the
older releases too. I wonder if this is intentional - it sure is a bit
misleading.
This was a suprise to me as well. According to documentation in
sortsupport.h (line 125-129) the parallel workers produce pre-sorted
segments during the scanning phase, which are subsequently merged by
the leader. This might mean that the 'sorting' phase is already
finished during the 'scanning' phase by waiting for the parallel
workers; I haven't looked further if this is the case and whether it
could be changed to also produce the sorting metrics, but seeing as it
is part of the parallel workers API of tuplesort, I think fixing it in
current releases is going to be difficult.
Maybe. Perhaps it's more complicated to decide when to switch between
phases with parallel workers. Still, the table scan is done after ~8
minutes (based on blocks_total vs. blocks_done), yet we keep that phase
for another ~9 minutes. It seems this is where the workers do the sort,
so "sorting live tuples" seems like a more natural phase for this.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company