Hi,

plan_create_index_workers[1] does not consider the amount of tuples existing in TOAST pages when determining the number of parallel workers to use for a build. The estimation comes from estimate_rel_size[2], which in this case, will just take the value from rel->rd_rel->relpages.

We probably don't notice this much with B-trees, given a B-tree is typically used for data that does not require toasting. However, this becomes more visible when working on custom index access methods that implement their own parallel build strategy.

For example, pgvector[3] provides its own data types and index access method for indexing vector data. Vectors can get quite large fairly quickly, e.g. a 768-dimensional vector takes up 8 + 4*768 = 3080 bytes on disk, which quickly clears the default TOAST tuple threshold.

In a recent patch proposal to allow for building indexes in parallel[4], I performed a few experiments on how many parallel workers would be spawned when indexing 1,000,000 (1MM) 768-dim vectors, both with EXTEDNED (default) and PLAIN storage. In all cases, I allowed for leader participation, but the leader is not considered in plan_create_index_workers.

With EXTENDED, plan_create_index_workers recommended 2 workers. The build time was ~2x faster than the serial build.

With PLAIN, plan_create_index_workers recommended 4 workers. The build time was **~3X faster** than the serial build.

(I've been doing more detailed, less hand-waivy performance testing, but I wanted to provide directional numbers here)

It seems like we're leaving some performance for columns with TOASTed data that require indexing, so I wanted to propose allowing the pages in TOASTed tables to be considered when we're trying to index a column with TOASTed attributes.

Thanks,

Jonathan

[1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/optimizer/plan/planner.c;hb=refs/heads/master#l6734 [2] https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/optimizer/util/plancat.c;hb=refs/heads/master#l1117
[3] https://github.com/pgvector/pgvector
[4] https://github.com/pgvector/pgvector/commits/parallel-index-build

Attachment: OpenPGP_signature
Description: OpenPGP digital signature

Reply via email to