As I mentioned in the commit message for the Parallel Append commit (ab72716778128fb63d54ac256adf7fe6820a1185), it's kind of sad that this doesn't work with UNION ALL queries, which are an obvious candidate for such parallelization. It turns out that it actually does work to a limited degree: assuming that the UNION ALL query can be converted to a simple appendrel, it can consider a parallel append of non-partial paths only. The attached patch lets it consider a parallel append of partial paths by doing the following things:
1. Teaching set_subquery_pathlist to create *partial* SubqueryScan paths as well as non-partial ones. 2. Teaching grouping_planner to create partial paths for the final rel if not at the outermost query level. 3. Modifying finalize_plan to allow the gather_param to be passed across subquery boundaries. #3 is the only part I'm really unsure about; the other stuff looks pretty cut and dried. I have a draft patch that handles the case where the union can't be converted to a simple appendrel, too, but that's not quite baked enough to post yet. For those for whom the above may be too technical to follow, here's an example: pgbench -i 40 explain (costs off) select a.bid from pgbench_accounts a, pgbench_branches b where a.bid = b.bid and aid % 1000 = 0 union all select a.bid from pgbench_accounts a where aid % 1000 = 0; Unpatched: Append -> Gather Workers Planned: 2 -> Hash Join Hash Cond: (a.bid = b.bid) -> Parallel Seq Scan on pgbench_accounts a Filter: ((aid % 1000) = 0) -> Hash -> Seq Scan on pgbench_branches b -> Gather Workers Planned: 2 -> Parallel Seq Scan on pgbench_accounts a_1 Filter: ((aid % 1000) = 0) Patched: Gather Workers Planned: 2 -> Parallel Append -> Hash Join Hash Cond: (a.bid = b.bid) -> Parallel Seq Scan on pgbench_accounts a Filter: ((aid % 1000) = 0) -> Hash -> Seq Scan on pgbench_branches b -> Parallel Seq Scan on pgbench_accounts a_1 Filter: ((aid % 1000) = 0) In this particular case the change doesn't buy very much, but the second plan is better because avoid shutting down one set of workers and starting a new set. That's more efficient, plus it allows the two branches to be worked in parallel rather than serially. On a small enough scale factor, even without the patch, you get this... Gather Workers Planned: 2 -> Parallel Append -> Nested Loop Join Filter: (a.bid = b.bid) -> Seq Scan on pgbench_branches b -> Seq Scan on pgbench_accounts a Filter: ((aid % 1000) = 0) -> Seq Scan on pgbench_accounts a_1 Filter: ((aid % 1000) = 0) ...but that's not good because now we have regular sequential scans instead of partial sequential scans. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
subquery-smarts.patch
Description: Binary data