Re: Add index scan progress to pg_stat_progress_vacuum

Imseih (AWS), Sami Mon, 14 Mar 2022 09:21:14 -0700

>    I'm still unsure the current design of 0001 patch is better than other
>    approaches we’ve discussed. Even users who don't use parallel vacuum
>    are forced to allocate shared memory for index vacuum progress, with
>    GetMaxBackends() entries from the beginning. Also, it’s likely to
>    extend the progress tracking feature for other parallel operations in
>    the future but I think the current design is not extensible. If we
>    want to do that, we will end up creating similar things for each of
>    them or re-creating index vacuum progress tracking feature while
>    creating a common infra. It might not be a problem as of now but I'm
>    concerned that introducing a feature that is not extensible and forces
>    users to allocate additional shmem might be a blocker in the future.
>    Looking at the precedent example, When we introduce the progress
>    tracking feature, we implemented it in an extensible way. On the other
>    hand, others in this thread seem to agree with this approach, so I'd
>    like to leave it to committers.


Thanks for the review!

I think you make strong arguments as to why we need to take a different 
approach now than later. 

Flaws with current patch set:

1. GetMaxBackends() is a really heavy-handed overallocation of a shared memory 
serving a very specific purpose.
2. Going with the approach of a vacuum specific hash breaks the design of 
progress which is meant to be extensible.
3. Even if we go with this current approach as an interim solution, it will be 
a real pain in the future.

With that said, v7 introduces the new infrastructure. 0001 includes the new 
infrastructure and 0002 takes advantage of this.

This approach is the following:

1. Introduces a new API called pgstat_progress_update_param_parallel along with 
some others support functions. This new infrastructure is in backend_progress.c

2. There is still a shared memory involved, but the size is capped to " 
max_worker_processes" which is the max to how many parallel workers can be 
doing work at any given time. The shared memory hash includes a 
st_progress_param array just like the Backend Status array.

typedef struct ProgressParallelEntry
{
    pid_t   leader_pid;
    int64   st_progress_param[PGSTAT_NUM_PROGRESS_PARAM];
} ProgressParallelEntry;

3. The progress update function is "pgstat_progress_update_param_parallel" and 
will aggregate totals reported for a specific progress parameter

For example , it can be called lie below. In the case below, 
PROGRESS_VACUUM_INDEXES_COMPLETED is incremented by 1 in the shared memory 
entry shared by the workers and leader.

case PARALLEL_INDVAC_STATUS_NEED_BULKDELETE:
                        istat_res = vac_bulkdel_one_index(&ivinfo, istat, 
pvs->dead_items);
                        
pgstat_progress_update_param_parallel(pvs->shared->leader_pid, 
PROGRESS_VACUUM_INDEXES_COMPLETED, 1); <<-----
                        break;

4. pg_stat_get_progress_info will call a function called 
pgstat_progress_set_parallel which will set the parameter value to the total 
from the shared memory hash.

I believe this approach gives proper infrastructure for future use-cases of 
workers reporting progress -and- does not do the heavy-handed shared memory 
allocation.

--
Sami Imseih
Amazon Web Services

v7-0001-Add-infrastructure-for-parallel-progress-reportin.patch
Description: v7-0001-Add-infrastructure-for-parallel-progress-reportin.patch

v7-0002-Show-progress-for-index-vacuums.patch
Description: v7-0002-Show-progress-for-index-vacuums.patch

v7-0003-Expose-indexes-being-processed-in-a-VACUUM-operat.patch
Description: v7-0003-Expose-indexes-being-processed-in-a-VACUUM-operat.patch

v7-0004-Rename-index_vacuum_count-in-pg_stat_pogress_vacu.patch
Description: v7-0004-Rename-index_vacuum_count-in-pg_stat_pogress_vacu.patch

Re: Add index scan progress to pg_stat_progress_vacuum

Reply via email to