On Tue, Apr 07, 2020 at 09:57:46AM +0530, Amit Kapila wrote: > On Mon, Mar 23, 2020 at 10:34 AM Amit Kapila <amit.kapil...@gmail.com> wrote: > > On Sun, Mar 22, 2020 at 7:48 AM Justin Pryzby <pry...@telsasoft.com> wrote: > > > > > > Original, long thread > > > https://www.postgresql.org/message-id/flat/CAA4eK1%2Bnw1FBK3_sDnW%2B7kB%2Bx4qbDJqetgqwYW8k2xv82RZ%2BKw%40mail.gmail.com#b1745ee853b137043e584b500b41300f > > > > > > > I have comments/questions on the patches: > > doc changes > > ------------------------- > > 1. > > <para> > > - Perform vacuum index and cleanup index phases of > > <command>VACUUM</command> > > + Perform vacuum index and index cleanup phases of > > <command>VACUUM</command> > > > > Why is the proposed text improvement over what is already there? > > I have kept the existing text as it is for this case.
Probably it should say "index vacuum and index cleanup", which is also what the comment in vacuumlazy.c says. > > 2. > > If the > > - <literal>PARALLEL</literal> option is omitted, then > > - <command>VACUUM</command> decides the number of workers based on > > number > > - of indexes that support parallel vacuum operation on the relation > > which > > - is further limited by <xref > > linkend="guc-max-parallel-workers-maintenance"/>. > > - The index can participate in a parallel vacuum if and only if the > > size > > + <literal>PARALLEL</literal> option is omitted, then the number of > > workers > > + is determined based on the number of indexes that support parallel > > vacuum > > + operation on the relation, and is further limited by <xref > > + linkend="guc-max-parallel-workers-maintenance"/>. > > + An index can participate in parallel vacuum if and only if the size > > of the index is more than <xref > > linkend="guc-min-parallel-index-scan-size"/>. > > > > Here, it is not clear to me why the proposed text is better than what > > we already have? > Changed as per your suggestion. To answer your question: "VACUUM decides" doesn't sound consistent with docs. "based on {+the+} number" => here, you're veritably missing an article... "relation which" technically means that the *relation* is "is further limited by"... > > Patch changes > > ------------------------- > > 1. > > - * and index cleanup with parallel workers unless the parallel vacuum is > > + * and index cleanup with parallel workers unless parallel vacuum is > > > > As per my understanding 'parallel vacuum' is a noun phrase, so we > > should have a determiner before it. > > Changed as per your suggestion. Thanks (I think you meant an "article"). > > - * We allow each worker to update it as and when it has incurred any cost > > and > > + * We allow each worker to update it AS AND WHEN it has incurred any cost > > and > > > > I don't see why it is necessary to make this part bold? We are using > > it at one other place in the code in the way it is used here. > > > > Kept the existing text as it is. I meant this as a question. I'm not sure what "as and when means" ? "If and when" ? > I have combined both of your patches. Let me know if you have any > more suggestions, otherwise, I am planning to push this tomorrow. In the meantime, I found some more issues, so I rebased on top of your patch so you can review it. -- Justin
>From 3551aba7445b46d60d2199d068c0e06a5828c5c3 Mon Sep 17 00:00:00 2001 From: Amit Kapila <akap...@postgresql.org> Date: Tue, 7 Apr 2020 09:48:28 +0530 Subject: [PATCH v3 1/2] Comments and doc fixes for commit 40d964ec99. Reported-by: Justin Pryzby Author: Justin Pryzby, with few changes by me Reviewed-by: Amit Kapila Discussion: https://postgr.es/m/20200322021801.gb2...@telsasoft.com --- doc/src/sgml/ref/vacuum.sgml | 16 +++++++------- src/backend/access/heap/vacuumlazy.c | 30 +++++++++++++-------------- src/backend/access/transam/parallel.c | 2 +- src/backend/commands/vacuum.c | 20 +++++++++--------- src/include/commands/vacuum.h | 2 +- 5 files changed, 35 insertions(+), 35 deletions(-) diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml index 846056a353..b7e0a8af6b 100644 --- a/doc/src/sgml/ref/vacuum.sgml +++ b/doc/src/sgml/ref/vacuum.sgml @@ -234,13 +234,13 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ <replaceable class="paramet <para> Perform vacuum index and cleanup index phases of <command>VACUUM</command> in parallel using <replaceable class="parameter">integer</replaceable> - background workers (for the detail of each vacuum phases, please + background workers (for the detail of each vacuum phase, please refer to <xref linkend="vacuum-phases"/>). If the - <literal>PARALLEL</literal> option is omitted, then - <command>VACUUM</command> decides the number of workers based on number - of indexes that support parallel vacuum operation on the relation which - is further limited by <xref linkend="guc-max-parallel-workers-maintenance"/>. - The index can participate in a parallel vacuum if and only if the size + <literal>PARALLEL</literal> option is omitted, then the number of workers + is determined based on the number of indexes that support parallel vacuum + operation on the relation, and is further limited by <xref + linkend="guc-max-parallel-workers-maintenance"/>. + An index can participate in parallel vacuum if and only if the size of the index is more than <xref linkend="guc-min-parallel-index-scan-size"/>. Please note that it is not guaranteed that the number of parallel workers specified in <replaceable class="parameter">integer</replaceable> will @@ -248,7 +248,7 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ <replaceable class="paramet workers than specified, or even with no workers at all. Only one worker can be used per index. So parallel workers are launched only when there are at least <literal>2</literal> indexes in the table. Workers for - vacuum launches before starting each phase and exit at the end of + vacuum are launched before the start of each phase and exit at the end of the phase. These behaviors might change in a future release. This option can't be used with the <literal>FULL</literal> option. </para> @@ -367,7 +367,7 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ <replaceable class="paramet <command>VACUUM</command> causes a substantial increase in I/O traffic, which might cause poor performance for other active sessions. Therefore, it is sometimes advisable to use the cost-based vacuum delay feature. For - parallel vacuum, each worker sleeps proportional to the work done by that + parallel vacuum, each worker sleeps proportionally to the work done by that worker. See <xref linkend="runtime-config-resource-vacuum-cost"/> for details. </para> diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c index f3382d37a4..a757560996 100644 --- a/src/backend/access/heap/vacuumlazy.c +++ b/src/backend/access/heap/vacuumlazy.c @@ -208,7 +208,7 @@ typedef struct LVShared * live tuples in the index vacuum case or the new live tuples in the * index cleanup case. * - * estimated_count is true if the reltuples is an estimated value. + * estimated_count is true if reltuples is an estimated value. */ double reltuples; bool estimated_count; @@ -732,7 +732,7 @@ vacuum_log_cleanup_info(Relation rel, LVRelStats *vacrelstats) * to reclaim dead line pointers. * * If the table has at least two indexes, we execute both index vacuum - * and index cleanup with parallel workers unless the parallel vacuum is + * and index cleanup with parallel workers unless parallel vacuum is * disabled. In a parallel vacuum, we enter parallel mode and then * create both the parallel context and the DSM segment before starting * heap scan so that we can record dead tuples to the DSM segment. All @@ -809,8 +809,8 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats, vacrelstats->latestRemovedXid = InvalidTransactionId; /* - * Initialize the state for a parallel vacuum. As of now, only one worker - * can be used for an index, so we invoke parallelism only if there are at + * Initialize state for a parallel vacuum. As of now, only one worker can + * be used for an index, so we invoke parallelism only if there are at * least two indexes on a table. */ if (params->nworkers >= 0 && vacrelstats->useindex && nindexes > 1) @@ -837,7 +837,7 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats, } /* - * Allocate the space for dead tuples in case the parallel vacuum is not + * Allocate the space for dead tuples in case parallel vacuum is not * initialized. */ if (!ParallelVacuumIsActive(lps)) @@ -2215,7 +2215,7 @@ parallel_vacuum_index(Relation *Irel, IndexBulkDeleteResult **stats, shared_indstats = get_indstats(lvshared, idx); /* - * Skip processing indexes that doesn't participate in parallel + * Skip processing indexes that don't participate in parallel * operation */ if (shared_indstats == NULL || @@ -2328,8 +2328,8 @@ vacuum_one_index(Relation indrel, IndexBulkDeleteResult **stats, shared_indstats->updated = true; /* - * Now that the stats[idx] points to the DSM segment, we don't need - * the locally allocated results. + * Now that stats[idx] points to the DSM segment, we don't need the + * locally allocated results. */ pfree(*stats); *stats = bulkdelete_res; @@ -2449,7 +2449,7 @@ lazy_vacuum_index(Relation indrel, IndexBulkDeleteResult **stats, * lazy_cleanup_index() -- do post-vacuum cleanup for one index relation. * * reltuples is the number of heap tuples and estimated_count is true - * if the reltuples is an estimated value. + * if reltuples is an estimated value. */ static void lazy_cleanup_index(Relation indrel, @@ -3050,9 +3050,9 @@ heap_page_is_all_visible(Relation rel, Buffer buf, /* * Compute the number of parallel worker processes to request. Both index * vacuum and index cleanup can be executed with parallel workers. The index - * is eligible for parallel vacuum iff it's size is greater than + * is eligible for parallel vacuum iff its size is greater than * min_parallel_index_scan_size as invoking workers for very small indexes - * can hurt the performance. + * can hurt performance. * * nrequested is the number of parallel workers that user requested. If * nrequested is 0, we compute the parallel degree based on nindexes, that is @@ -3071,7 +3071,7 @@ compute_parallel_vacuum_workers(Relation *Irel, int nindexes, int nrequested, int i; /* - * We don't allow to perform parallel operation in standalone backend or + * We don't allow performing parallel operation in standalone backend or * when parallelism is disabled. */ if (!IsUnderPostmaster || max_parallel_maintenance_workers == 0) @@ -3144,7 +3144,7 @@ prepare_index_statistics(LVShared *lvshared, bool *can_parallel_vacuum, } /* - * Update index statistics in pg_class if the statistics is accurate. + * Update index statistics in pg_class if the statistics are accurate. */ static void update_index_statistics(Relation *Irel, IndexBulkDeleteResult **stats, @@ -3345,7 +3345,7 @@ begin_parallel_vacuum(Oid relid, Relation *Irel, LVRelStats *vacrelstats, /* * Destroy the parallel context, and end parallel mode. * - * Since writes are not allowed during the parallel mode, so we copy the + * Since writes are not allowed during parallel mode, copy the * updated index statistics from DSM in local memory and then later use that * to update the index statistics. One might think that we can exit from * parallel mode, update the index statistics and then destroy parallel @@ -3452,7 +3452,7 @@ skip_parallel_vacuum_index(Relation indrel, LVShared *lvshared) * Perform work within a launched parallel process. * * Since parallel vacuum workers perform only index vacuum or index cleanup, - * we don't need to report the progress information. + * we don't need to report progress information. */ void parallel_vacuum_main(dsm_segment *seg, shm_toc *toc) diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c index 29057f389e..14a8690019 100644 --- a/src/backend/access/transam/parallel.c +++ b/src/backend/access/transam/parallel.c @@ -505,7 +505,7 @@ ReinitializeParallelDSM(ParallelContext *pcxt) /* * Reinitialize parallel workers for a parallel context such that we could - * launch the different number of workers. This is required for cases where + * launch a different number of workers. This is required for cases where * we need to reuse the same DSM segment, but the number of workers can * vary from run-to-run. */ diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c index 3a89f8fe1e..495ac23a26 100644 --- a/src/backend/commands/vacuum.c +++ b/src/backend/commands/vacuum.c @@ -2036,23 +2036,23 @@ vacuum_delay_point(void) /* * Computes the vacuum delay for parallel workers. * - * The basic idea of a cost-based vacuum delay for parallel vacuum is to allow - * each worker to sleep proportional to the work done by it. We achieve this + * The basic idea of a cost-based delay for parallel vacuum is to allow each + * worker to sleep in proportion to the share of work it's done. We achieve this * by allowing all parallel vacuum workers including the leader process to * have a shared view of cost related parameters (mainly VacuumCostBalance). * We allow each worker to update it as and when it has incurred any cost and * then based on that decide whether it needs to sleep. We compute the time * to sleep for a worker based on the cost it has incurred * (VacuumCostBalanceLocal) and then reduce the VacuumSharedCostBalance by - * that amount. This avoids letting the workers sleep who have done less or - * no I/O as compared to other workers and therefore can ensure that workers - * who are doing more I/O got throttled more. + * that amount. This avoids putting to sleep those workers which have done less + * I/O than other workers and therefore ensure that workers + * which are doing more I/O got throttled more. * - * We allow any worker to sleep only if it has performed the I/O above a - * certain threshold, which is calculated based on the number of active - * workers (VacuumActiveNWorkers), and the overall cost balance is more than - * VacuumCostLimit set by the system. The testing reveals that we achieve - * the required throttling if we allow a worker that has done more than 50% + * We allow a worker to sleep only if it has performed I/O above a certain + * threshold, which is calculated based on the number of active workers + * (VacuumActiveNWorkers), and the overall cost balance is more than + * VacuumCostLimit set by the system. Testing reveals that we achieve + * the required throttling if we force a worker that has done more than 50% * of its share of work to sleep. */ static double diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h index 2779bea5c9..a4cd721400 100644 --- a/src/include/commands/vacuum.h +++ b/src/include/commands/vacuum.h @@ -225,7 +225,7 @@ typedef struct VacuumParams /* * The number of parallel vacuum workers. 0 by default which means choose - * based on the number of indexes. -1 indicates a parallel vacuum is + * based on the number of indexes. -1 indicates parallel vacuum is * disabled. */ int nworkers; -- 2.17.0
>From 3c6314b72d9323dd1554deadedf1c57fe105ad6b Mon Sep 17 00:00:00 2001 From: Justin Pryzby <pryz...@telsasoft.com> Date: Sun, 19 Jan 2020 22:35:01 -0600 Subject: [PATCH v3 2/2] docs/comments review for parallel vacuum --- doc/src/sgml/ref/vacuum.sgml | 12 ++++++------ src/backend/access/heap/vacuumlazy.c | 22 +++++++++++----------- src/backend/commands/vacuum.c | 2 +- 3 files changed, 18 insertions(+), 18 deletions(-) diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml index b7e0a8af6b..5dc0787459 100644 --- a/doc/src/sgml/ref/vacuum.sgml +++ b/doc/src/sgml/ref/vacuum.sgml @@ -232,9 +232,9 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ <replaceable class="paramet <term><literal>PARALLEL</literal></term> <listitem> <para> - Perform vacuum index and cleanup index phases of <command>VACUUM</command> + Perform index vacuum and index cleanup phases of <command>VACUUM</command> in parallel using <replaceable class="parameter">integer</replaceable> - background workers (for the detail of each vacuum phase, please + background workers (for the details of each vacuum phase, please refer to <xref linkend="vacuum-phases"/>). If the <literal>PARALLEL</literal> option is omitted, then the number of workers is determined based on the number of indexes that support parallel vacuum @@ -358,16 +358,16 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ <replaceable class="paramet </para> <para> - The <option>PARALLEL</option> option is used only for vacuum purpose. - Even if this option is specified with <option>ANALYZE</option> option - it does not affect <option>ANALYZE</option>. + The <option>PARALLEL</option> option is used only for vacuum operations. + If specified along with <option>ANALYZE</option>, the behavior during + <literal>ANALYZE</literal> is unchanged. </para> <para> <command>VACUUM</command> causes a substantial increase in I/O traffic, which might cause poor performance for other active sessions. Therefore, it is sometimes advisable to use the cost-based vacuum delay feature. For - parallel vacuum, each worker sleeps proportionally to the work done by that + parallel vacuum, each worker sleeps in proportion to the work done by that worker. See <xref linkend="runtime-config-resource-vacuum-cost"/> for details. </para> diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c index a757560996..dfa57df7b6 100644 --- a/src/backend/access/heap/vacuumlazy.c +++ b/src/backend/access/heap/vacuumlazy.c @@ -232,8 +232,8 @@ typedef struct LVShared /* * Number of active parallel workers. This is used for computing the - * minimum threshold of the vacuum cost balance for a worker to go for the - * delay. + * minimum threshold of the vacuum cost balance before a worker sleeps + * for cost-based delay. */ pg_atomic_uint32 active_nworkers; @@ -2312,12 +2312,12 @@ vacuum_one_index(Relation indrel, IndexBulkDeleteResult **stats, /* * Copy the index bulk-deletion result returned from ambulkdelete and - * amvacuumcleanup to the DSM segment if it's the first time to get it - * from them, because they allocate it locally and it's possible that an - * index will be vacuumed by the different vacuum process at the next - * time. The copying of the result normally happens only after the first - * time of index vacuuming. From the second time, we pass the result on - * the DSM segment so that they then update it directly. + * amvacuumcleanup to the DSM segment if it's the first time to get a result + * from a worker, because workers allocate BulkDeleteResults locally, and it's possible that an + * index will be vacuumed by a different vacuum process the next + * time. Copying the result normally happens only the first + * time an index is vacuumed. For any additional vacuums, we pass the result on + * the DSM segment so that workers then update it directly. * * Since all vacuum workers write the bulk-deletion result at different * slots we can write them without locking. @@ -3138,7 +3138,7 @@ prepare_index_statistics(LVShared *lvshared, bool *can_parallel_vacuum, if (!can_parallel_vacuum[i]) continue; - /* Set NOT NULL as this index do support parallelism */ + /* Set NOT NULL as this index does support parallelism */ lvshared->bitmap[i >> 3] |= 1 << (i & 0x07); } } @@ -3174,7 +3174,7 @@ update_index_statistics(Relation *Irel, IndexBulkDeleteResult **stats, /* * This function prepares and returns parallel vacuum state if we can launch - * even one worker. This function is responsible to enter parallel mode, + * even one worker. This function is responsible for entering parallel mode, * create a parallel context, and then initialize the DSM segment. */ static LVParallelState * @@ -3346,7 +3346,7 @@ begin_parallel_vacuum(Oid relid, Relation *Irel, LVRelStats *vacrelstats, * Destroy the parallel context, and end parallel mode. * * Since writes are not allowed during parallel mode, copy the - * updated index statistics from DSM in local memory and then later use that + * updated index statistics from DSM into local memory and then later use that * to update the index statistics. One might think that we can exit from * parallel mode, update the index statistics and then destroy parallel * context, but that won't be safe (see ExitParallelMode). diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c index 495ac23a26..351d5215a9 100644 --- a/src/backend/commands/vacuum.c +++ b/src/backend/commands/vacuum.c @@ -2040,7 +2040,7 @@ vacuum_delay_point(void) * worker to sleep in proportion to the share of work it's done. We achieve this * by allowing all parallel vacuum workers including the leader process to * have a shared view of cost related parameters (mainly VacuumCostBalance). - * We allow each worker to update it as and when it has incurred any cost and + * We allow each worker to update it AS AND WHEN it has incurred any cost and * then based on that decide whether it needs to sleep. We compute the time * to sleep for a worker based on the cost it has incurred * (VacuumCostBalanceLocal) and then reduce the VacuumSharedCostBalance by -- 2.17.0