Hi Seamus, Please see my reply below.
On Tue, Feb 16, 2021 at 1:35 AM Seamus Abshere <sea...@abshere.net> wrote: > On Mon, Feb 15, 2021, at 3:53 AM, Amit Langote wrote: > > On Mon, Feb 15, 2021 at 5:28 PM Seamus Abshere <sea...@abshere.net> wrote: > > > It turns out parallel_workers may be a useful reloption for certain uses > > > of partitioned tables, at least if they're made up of fancy column store > > > partitions (see > > > https://www.postgresql.org/message-id/7d6fdc20-857c-4cbe-ae2e-c0ff9520ed55%40www.fastmail.com). > > > > > > Would somebody tell me what I'm doing wrong? I would love to submit a > > > patch but I'm stuck: > > > You may see by inspecting the callers of compute_parallel_worker() > > that it never gets called on a partitioned table, only its leaf > > partitions. Maybe you could try calling compute_parallel_worker() > > somewhere in add_paths_to_append_rel(), which has this code to figure > > out parallel_workers to use for a parallel Append path for a given > > partitioned table: > > > > /* Find the highest number of workers requested for any subpath. */ > > foreach(lc, partial_subpaths) > > { > > Path *path = lfirst(lc); > > > > parallel_workers = Max(parallel_workers, > > path->parallel_workers); > > } > > Assert(parallel_workers > 0); > > > > /* > > * If the use of parallel append is permitted, always request at > > least > > * log2(# of children) workers. We assume it can be useful to have > > * extra workers in this case because they will be spread out across > > * the children. The precise formula is just a guess, but we don't > > * want to end up with a radically different answer for a table > > with N > > * partitions vs. an unpartitioned table with the same data, so the > > * use of some kind of log-scaling here seems to make some sense. > > */ > > if (enable_parallel_append) > > { > > parallel_workers = Max(parallel_workers, > > fls(list_length(live_childrels))); > > parallel_workers = Min(parallel_workers, > > max_parallel_workers_per_gather); > > } > > Assert(parallel_workers > 0); > > > > /* Generate a partial append path. */ > > appendpath = create_append_path(root, rel, NIL, partial_subpaths, > > NIL, NULL, parallel_workers, > > enable_parallel_append, > > -1); > > > > Note that the 'rel' in this code refers to the partitioned table for > > which an Append path is being considered, so compute_parallel_worker() > > using that 'rel' would use the partitioned table's > > rel_parallel_workers as you are trying to do. > > Here we go, my first patch... solves > https://www.postgresql.org/message-id/7d6fdc20-857c-4cbe-ae2e-c0ff9520e...@www.fastmail.com Thanks for sending the patch here. It seems you haven't made enough changes for reloptions code to recognize parallel_workers as valid for partitioned tables, because even with the patch applied, I get this: create table rp (a int) partition by range (a); create table rp1 partition of rp for values from (minvalue) to (0); create table rp2 partition of rp for values from (0) to (maxvalue); alter table rp set (parallel_workers = 1); ERROR: unrecognized parameter "parallel_workers" You need this: diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c index 029a73325e..9eb8a0c10d 100644 --- a/src/backend/access/common/reloptions.c +++ b/src/backend/access/common/reloptions.c @@ -377,7 +377,7 @@ static relopt_int intRelOpts[] = { "parallel_workers", "Number of parallel processes that can be used per executor node for this relation.", - RELOPT_KIND_HEAP, + RELOPT_KIND_HEAP | RELOPT_KIND_PARTITIONED, ShareUpdateExclusiveLock }, -1, 0, 1024 which tells reloptions parsing code that parallel_workers is acceptable for both heap and partitioned relations. Other comments on the patch: * This function calling style, where the first argument is not placed on the same line as the function itself, is not very common in Postgres: + /* First see if there is a root-level setting for parallel_workers */ + parallel_workers = compute_parallel_worker( + rel, + -1, + -1, + max_parallel_workers_per_gather + This makes the new code look very different from the rest of the codebase. Better to stick to existing styles. 2. It might be a good idea to use this opportunity to add a function, say compute_append_parallel_workers(), for the code that does what the function name says. Then the patch will simply add the new compute_parallel_worker() call at the top of that function. 3. I think we should consider the Append parent relation's parallel_workers ONLY if it is a partitioned relation, because it doesn't make a lot of sense for other types of parent relations. So the new code should look like this: if (IS_PARTITIONED_REL(rel)) parallel_workers = compute_parallel_worker(rel, -1, -1, max_parallel_workers_per_gather); 4. Maybe it also doesn't make sense to consider the parent relation's parallel_workers if Parallel Append is disabled (enable_parallel_append = off). That's because with a simple (non-parallel) Append running under Gather, all launched parallel workers process the same partition before moving to the next one. OTOH, one's intention of setting parallel_workers on the parent partitioned table would most likely be to distribute workers across partitions, which is only possible with parallel Append (enable_parallel_append = on). So, the new code should look like this: if (IS_PARTITIONED_REL(rel) && enable_parallel_append) parallel_workers = compute_parallel_worker(rel, -1, -1, max_parallel_workers_per_gather); BTW, please consider bottom-posting like I've done in this reply, because that makes it easier to follow discussions involving patch reviews that can potentially take many emails to reach conclusions. -- Amit Langote EDB: http://www.enterprisedb.com