Re: A reloption for partitioned tables - parallel_workers

Amit Langote Mon, 15 Feb 2021 22:06:19 -0800

Hi Seamus,

Please see my reply below.


On Tue, Feb 16, 2021 at 1:35 AM Seamus Abshere <sea...@abshere.net> wrote:
> On Mon, Feb 15, 2021, at 3:53 AM, Amit Langote wrote:
> > On Mon, Feb 15, 2021 at 5:28 PM Seamus Abshere <sea...@abshere.net> wrote:
> > > It turns out parallel_workers may be a useful reloption for certain uses 
> > > of partitioned tables, at least if they're made up of fancy column store 
> > > partitions (see 
> > > https://www.postgresql.org/message-id/7d6fdc20-857c-4cbe-ae2e-c0ff9520ed55%40www.fastmail.com).
> > >
> > > Would somebody tell me what I'm doing wrong? I would love to submit a 
> > > patch but I'm stuck:
>
> > You may see by inspecting the callers of compute_parallel_worker()
> > that it never gets called on a partitioned table, only its leaf
> > partitions.  Maybe you could try calling compute_parallel_worker()
> > somewhere in add_paths_to_append_rel(), which has this code to figure
> > out parallel_workers to use for a parallel Append path for a given
> > partitioned table:
> >
> >         /* Find the highest number of workers requested for any subpath. */
> >         foreach(lc, partial_subpaths)
> >         {
> >             Path       *path = lfirst(lc);
> >
> >             parallel_workers = Max(parallel_workers, 
> > path->parallel_workers);
> >         }
> >         Assert(parallel_workers > 0);
> >
> >         /*
> >          * If the use of parallel append is permitted, always request at 
> > least
> >          * log2(# of children) workers.  We assume it can be useful to have
> >          * extra workers in this case because they will be spread out across
> >          * the children.  The precise formula is just a guess, but we don't
> >          * want to end up with a radically different answer for a table 
> > with N
> >          * partitions vs. an unpartitioned table with the same data, so the
> >          * use of some kind of log-scaling here seems to make some sense.
> >          */
> >         if (enable_parallel_append)
> >         {
> >             parallel_workers = Max(parallel_workers,
> >                                    fls(list_length(live_childrels)));
> >             parallel_workers = Min(parallel_workers,
> >                                    max_parallel_workers_per_gather);
> >         }
> >         Assert(parallel_workers > 0);
> >
> >         /* Generate a partial append path. */
> >         appendpath = create_append_path(root, rel, NIL, partial_subpaths,
> >                                         NIL, NULL, parallel_workers,
> >                                         enable_parallel_append,
> >                                         -1);
> >
> > Note that the 'rel' in this code refers to the partitioned table for
> > which an Append path is being considered, so compute_parallel_worker()
> > using that 'rel' would use the partitioned table's
> > rel_parallel_workers as you are trying to do.
>
> Here we go, my first patch... solves 
> https://www.postgresql.org/message-id/7d6fdc20-857c-4cbe-ae2e-c0ff9520e...@www.fastmail.com

Thanks for sending the patch here.

It seems you haven't made enough changes for reloptions code to
recognize parallel_workers as valid for partitioned tables, because
even with the patch applied, I get this:

create table rp (a int) partition by range (a);
create table rp1 partition of rp for values from (minvalue) to (0);
create table rp2 partition of rp for values from (0) to (maxvalue);
alter table rp set (parallel_workers = 1);
ERROR:  unrecognized parameter "parallel_workers"

You need this:

diff --git a/src/backend/access/common/reloptions.c
b/src/backend/access/common/reloptions.c
index 029a73325e..9eb8a0c10d 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -377,7 +377,7 @@ static relopt_int intRelOpts[] =
        {
            "parallel_workers",
            "Number of parallel processes that can be used per
executor node for this relation.",
-           RELOPT_KIND_HEAP,
+           RELOPT_KIND_HEAP | RELOPT_KIND_PARTITIONED,
            ShareUpdateExclusiveLock
        },
        -1, 0, 1024

which tells reloptions parsing code that parallel_workers is
acceptable for both heap and partitioned relations.

Other comments on the patch:

* This function calling style, where the first argument is not placed
on the same line as the function itself, is not very common in
Postgres:

+       /* First see if there is a root-level setting for parallel_workers */
+       parallel_workers = compute_parallel_worker(
+           rel,
+           -1,
+           -1,
+           max_parallel_workers_per_gather
+

This makes the new code look very different from the rest of the
codebase.  Better to stick to existing styles.

2. It might be a good idea to use this opportunity to add a function,
say compute_append_parallel_workers(), for the code that does what the
function name says.  Then the patch will simply add the new
compute_parallel_worker() call at the top of that function.

3. I think we should consider the Append parent relation's
parallel_workers ONLY if it is a partitioned relation, because it
doesn't make a lot of sense for other types of parent relations.  So
the new code should look like this:

if (IS_PARTITIONED_REL(rel))
    parallel_workers = compute_parallel_worker(rel, -1, -1,
max_parallel_workers_per_gather);

4. Maybe it also doesn't make sense to consider the parent relation's
parallel_workers if Parallel Append is disabled
(enable_parallel_append = off).  That's because with a simple
(non-parallel) Append running under Gather, all launched parallel
workers process the same partition before moving to the next one.
OTOH, one's intention of setting parallel_workers on the parent
partitioned table would most likely be to distribute workers across
partitions, which is only possible with parallel Append
(enable_parallel_append = on).  So, the new code should look like
this:

if (IS_PARTITIONED_REL(rel) && enable_parallel_append)
    parallel_workers = compute_parallel_worker(rel, -1, -1,
max_parallel_workers_per_gather);

BTW, please consider bottom-posting like I've done in this reply,
because that makes it easier to follow discussions involving patch
reviews that can potentially take many emails to reach conclusions.





--
Amit Langote
EDB: http://www.enterprisedb.com

Re: A reloption for partitioned tables - parallel_workers

Reply via email to