Re: Add index scan progress to pg_stat_progress_vacuum

Imseih (AWS), Sami Thu, 06 Jan 2022 18:14:53 -0800

Thanks for the review.

I am hesitant to make column name changes for obvious reasons, as it breaks 
existing tooling. However, I think there is a really good case to change 
"index_vacuum_count" as the name is confusing. "index_vacuum_cycles_completed" 
is the name I suggest if we agree to rename.


For the new column, "num_indexes_to_vacuum" is good with me. 

As far as  max_index_vacuum_cycle_time goes, Besides the points you make, 
another reason is that until one cycle completes, this value will remain at 0. 
It will not be helpful data for most vacuum cases. Removing it also reduces the 
complexity of the patch. 




On 1/6/22, 2:41 PM, "Bossart, Nathan" <bossa...@amazon.com> wrote:

    On 12/29/21, 8:44 AM, "Imseih (AWS), Sami" <sims...@amazon.com> wrote:
    > In "pg_stat_progress_vacuum", introduce 2 columns:
    >
    > * total_index_vacuum : This is the # of indexes that will be vacuumed. 
Keep in mind that if failsafe mode kicks in mid-flight to the vacuum, Postgres 
may choose to forgo index scans. This value will be adjusted accordingly.
    > * max_index_vacuum_cycle_time : The total elapsed time for a index vacuum 
cycle is calculated and this value will be updated to reflect the longest 
vacuum cycle. Until the first cycle completes, this value will be 0. The 
purpose of this column is to give the user an idea of how long an index vacuum 
cycle takes to complete.

    I think that total_index_vacuum is a good thing to have.  I would
    expect this to usually just be the number of indexes on the table, but
    as you pointed out, this can be different when we are skipping
    indexes.  My only concern with this new column is the potential for
    confusion when compared with the index_vacuum_count value.
    index_vacuum_count indicates the number of vacuum cycles completed,
    but total_index_vacuum indicates the number of indexes that will be
    vacuumed.  However, the names sound like they could refer to the same
    thing to me.  Perhaps we should rename index_vacuum_count to
    index_vacuum_cycles/index_vacuum_cycle_count, and the new column
    should be something like num_indexes_to_vacuum or index_vacuum_total.

    I don't think we need the max_index_vacuum_cycle_time column.  While
    the idea is to give users a rough estimate for how long an index cycle
    will take, I don't think it will help generate any meaningful
    estimates for how much longer the vacuum operation will take.  IIUC we
    won't have any idea how many total index vacuum cycles will be needed.
    Even if we did, the current cycle could take much more or much less
    time.  Also, none of the other progress views seem to provide any
    timing information, which I suspect is by design to avoid inaccurate
    estimates.

    > Introduce a new view called "pg_stat_progress_vacuum_index". This view 
will track the progress of a worker ( or leader PID ) while it's vacuuming an 
index. It will expose some key columns:
    >
    > * pid: The PID of the worker process
    >
    > * leader_pid: The PID of the leader process. This is the column that can 
be joined with "pg_stat_progress_vacuum". leader_pid and pid can have the same 
value as a leader can also perform an index vacuum.
    >
    > * indrelid: The relid of the index currently being vacuumed
    >
    > * vacuum_cycle_ordinal_position: The processing position of the index 
being vacuumed. This can be useful to determine how many indexes out of the 
total indexes ( pg_stat_progress_vacuum.total_index_vacuum ) have been vacuumed
    >
    > * index_tuples_vacuumed: This is the number of index tuples vacuumed for 
the index overall. This is useful to show that the vacuum is actually doing 
work, as the # of tuples keeps increasing. 

    Should we also provide some information for determining the progress
    of the current cycle?  Perhaps there should be an
    index_tuples_vacuumed_current_cycle column that users can compare with
    the num_dead_tuples value in pg_stat_progress_vacuum.  However,
    perhaps the number of tuples vacuumed in the current cycle can already
    be discovered via index_tuples_vacuumed % max_dead_tuples.

    +void
    +rusage_adjust(const PGRUsage *ru0, PGRUsage *ru1)
    +{
    +   if (ru1->tv.tv_usec < ru0->tv.tv_usec)
    +   {
    +           ru1->tv.tv_sec--;
    +           ru1->tv.tv_usec += 1000000;
    +   }
    +   if (ru1->ru.ru_stime.tv_usec < ru0->ru.ru_stime.tv_usec)
    +   {
    +           ru1->ru.ru_stime.tv_sec--;
    +           ru1->ru.ru_stime.tv_usec += 1000000;
    +   }
    +   if (ru1->ru.ru_utime.tv_usec < ru0->ru.ru_utime.tv_usec)
    +   {
    +           ru1->ru.ru_utime.tv_sec--;
    +           ru1->ru.ru_utime.tv_usec += 1000000;
    +   }
    +}

    I think this function could benefit from a comment.  Without going
    through it line by line, it is not clear to me exactly what it is
    doing.

    I know we're still working on what exactly this stuff should look
    like, but I would suggest adding the documentation changes in the near
    future.

    Nathan

Re: Add index scan progress to pg_stat_progress_vacuum

Reply via email to