GitHub user avamingli created a discussion: Give UNION ALL more opportunities for parallel plans in MPP.
### Description In CBDB's PostgreSQL implementation, UNION ALL operations currently have four processing methods: three parallel variants (Parallel Append with partial subpaths, Parallel Append with mixed partial/non-partial subpaths, and Append with partial subpaths) and one non-parallel (Append with non-partial subpaths). 1. Parallel Append with partial subpaths ```sql SELECT unique1 FROM tenk1 WHERE fivethous = tenthous + 1 UNION ALL SELECT unique1 FROM tenk1 WHERE fivethous = tenthous + 1; QUERY PLAN ---------------------------------------------------- Gather Workers Planned: 4 -> Parallel Append -> Parallel Seq Scan on tenk1 Filter: (fivethous = (tenthous + 1)) -> Parallel Seq Scan on tenk1 tenk1_1 Filter: (fivethous = (tenthous + 1)) (7 rows) ``` 2. Parallel Append with mixed partial/non-partial subpaths ```sql explain (costs off) select round(avg(aa)), sum(aa) from a_star; QUERY PLAN -------------------------------------------------------------- Finalize Aggregate -> Gather Workers Planned: 3 -> Partial Aggregate -> Parallel Append -> Seq Scan on d_star a_star_4 -> Seq Scan on c_star a_star_3 -> Parallel Seq Scan on f_star a_star_6 -> Parallel Seq Scan on e_star a_star_5 -> Parallel Seq Scan on b_star a_star_2 -> Parallel Seq Scan on a_star a_star_1 ``` 3. Append with partial subpaths ```sql explain (costs off) select round(avg(aa)), sum(aa) from a_star; QUERY PLAN -------------------------------------------------------------- Finalize Aggregate -> Gather Workers Planned: 1 -> Partial Aggregate -> Append -> Parallel Seq Scan on a_star a_star_1 -> Parallel Seq Scan on b_star a_star_2 -> Parallel Seq Scan on c_star a_star_3 -> Parallel Seq Scan on d_star a_star_4 -> Parallel Seq Scan on e_star a_star_5 -> Parallel Seq Scan on f_star a_star_6 (11 rows) ``` 4. Append with non-partial subpaths ```sql select * from (select *, 0 as x from int8_tbl a union all select *, 1 as x from int8_tbl b) ss where (x = 0) or (q1 >= q2 and q1 <= q2); QUERY PLAN --------------------------------------------- Append -> Seq Scan on int8_tbl a -> Seq Scan on int8_tbl b Filter: ((q1 >= q2) AND (q1 <= q2)) (4 rows) ``` While these work well for local queries, we've had to disable parallel execution when Motion nodes appear in subpaths due to a critical correctness issue. The problem occurs when Parallel Append workers mark subnodes as completed, potentially causing premature skipping of Motion-containing branches. This limitation forces serial execution for most distributed table queries, as shown in example plans where Gather Motion wraps a serial Append with Redistribute Motion nodes in each branch, missing significant optimization opportunities. ```sql explain(costs off) select b, count(*) from t1 group by b union all select b, count(*) from t2 group by b; QUERY PLAN ------------------------------------------------------------------ Gather Motion 3:1 (slice1; segments: 3) -> Append -> HashAggregate Group Key: t1.b -> Redistribute Motion 3:3 (slice2; segments: 3) Hash Key: t1.b -> Seq Scan on t1 -> HashAggregate Group Key: t2.b -> Redistribute Motion 3:3 (slice3; segments: 3) Hash Key: t2.b -> Seq Scan on t2 Optimizer: Postgres query optimizer (13 rows) ``` But, there is still a chance to be parallel. I propose a robust solution that first attempts parallel-aware Append when safe (no Motion nodes), then automatically falls back to parallel-oblivious Append when Motion hazards are detected. This works because regular Append reliably executes all subnodes regardless of Motion presence, while CBDB's Motion nodes inherently handle tuples individually without requiring worker coordination. The benefits extend beyond the UNION ALL operator itself - enabling this parallelization creates ripple effects where subpaths gain more parallel execution opportunities, particularly valuable for complex nested queries. This optimization stands to significantly improve TPC-DS benchmark performance and other analytical workloads involving distributed tables. We're preparing to implement this with prototype development, TPC-DS testing, and edge case validation. Community feedback is welcome on potential corner cases, benchmarking approaches, and real-world query patterns that might benefit most. ### Use case/motivation _No response_ ### Related issues _No response_ ### Are you willing to submit a PR? - [X] Yes I am willing to submit a PR! GitHub link: https://github.com/apache/cloudberry/discussions/1291 ---- This is an automatically sent email for dev@cloudberry.apache.org. To unsubscribe, please send an email to: dev-unsubscr...@cloudberry.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@cloudberry.apache.org For additional commands, e-mail: dev-h...@cloudberry.apache.org