Re: [PoC] Improve dead tuple storage for lazy vacuum

John Naylor Fri, 29 Mar 2024 00:21:59 -0700

On Thu, Mar 28, 2024 at 12:55 PM Masahiko Sawada <sawada.m...@gmail.com> wrote:
> I think the patch is in good shape. Do you have other comments or
> suggestions, John?


--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1918,11 +1918,6 @@ include_dir 'conf.d'
         too high.  It may be useful to control for this by separately
         setting <xref linkend="guc-autovacuum-work-mem"/>.
        </para>
-       <para>
-        Note that for the collection of dead tuple identifiers,
-        <command>VACUUM</command> is only able to utilize up to a maximum of
-        <literal>1GB</literal> of memory.
-       </para>
       </listitem>
      </varlistentry>

This is mentioned twice for two different GUCs -- need to remove the
other one, too. Other than that, I just have minor nits:

- * The major space usage for vacuuming is storage for the array of dead TIDs
+ * The major space usage for vacuuming is TID store, a storage for dead TIDs

I think I've helped edit this sentence before, but I still don't quite
like it. I'm thinking now "is storage for the dead tuple IDs".

- * set upper bounds on the number of TIDs we can keep track of at once.
+ * set upper bounds on the maximum memory that can be used for keeping track
+ * of dead TIDs at once.

I think "maximum" is redundant with "upper bounds".

I also feel the commit message needs more "meat" -- we need to clearly
narrate the features and benefits. I've attached how I would write it,
but feel free to use what you like to match your taste.

I've marked it Ready for Committer.

Use TID Store for storage of dead tuple IDs during lazy vacuum

Previously, we used a simple array for storing dead tuple IDs during
lazy vacuum, which had a number of problems:

* The array used a single allocation and so was limited to 1GB.
* The allocation was pessimistically sized according to table size.
* Lookup with binary search was slow because of poor CPU cache and
  branch prediction behavior.

This commit replaces that array with the TID store from commit XXXXXX.

Since the backing radix tree makes small allocations as needed,
the 1GB limit is now gone. Further, the total memory used is now
often smaller by an order of magnitude or more, depending on the
distribution of blocks and offsets. These two features should make
multiple rounds of heap scanning and index cleanup an extremely rare
event. TID lookup during index cleanup is also several times faster,
even more so when index order is correlated with heap tuple order.

Since there is no longer a predictable relationship between the number
of dead tuples vacuumed and the space taken up by their TIDs, the number
of tuples no longer provides any meaningful insights for users, nor
is the maximum number predictable. For that reason this commit also
changes to byte-based progress reporting, with the relevant columns
of pg_stat_progress_vacuum renamed accordingly to max_dead_tuple_bytes
and dead_tuple_bytes.

For parallel vacuum, both the TID store and supplemental information
specific to vacuum are shared among the parallel vacuum workers. As
with the previous array, we don't take any locks on TidStore during
parallel vacuum since writes are still only done by the leader process.

XXX: bump catalog version

Reviewed-by: John Naylor, (in an earlier version) Dilip Kumar
Discussion: 
https://postgr.es/m/CAD21AoAfOZvmfR0j8VmZorZjL7RhTiQdVttNuC4W-Shdc2a-AA%40mail.gmail.com

Re: [PoC] Improve dead tuple storage for lazy vacuum

Reply via email to