GitHub user avamingli edited a discussion: Make UNION Parallel
### Description In PostgreSQL, the UNION operator can leverage parallel processing through the Parallel Append node with subnodes. ```sql explain (costs off, verbose) select * from t1 union select * from t2; QUERY PLAN -------------------------------------------------- HashAggregate Output: t1.a, t1.b Group Key: t1.a, t1.b -> Gather Output: t1.a, t1.b Workers Planned: 3 -> Parallel Append -> Parallel Seq Scan on public.t1 Output: t1.a, t1.b -> Parallel Seq Scan on public.t2 Output: t2.a, t2.b (11 rows) ``` However, in CBDB, we face challenges when UNION subqueries include Motion nodes, which can lead to incorrect results if Parallel Append is used. This is due to the competition among workers for subnodes: some subnodes are executed by a single worker while others may be processed by multiple workers. When a worker completes a task for a multi-worker subnode, it marks that job as finished. While this is acceptable in PostgreSQL, but in CBDB, the presence of Motion nodes complicates this process. The premature marking of completion can cause the Motion Sender to stop early, resulting in data loss for other workers. For cases involving only Scan nodes—such as Parallel Scans on partitioned tables—this issue does not arise. However, for UNION ALL and similar scenarios, the use of Parallel Append is unsafe. That's another problem and we should consider disabling it in the future. Despite these challenges, as an MPP database, CBDB has the potential to support parallel processing for UNION operations. This can be achieved if multiple workers execute Append nodes within a slice and the data is well-distributed (hashed across multiple segments relative to the cluster) according to the output columns of the subqueries. Specifically, we can implement a Parallel-oblivious Append approach, where multiple workers operate independently without sharing state, rather than a Parallel-aware Append that requires coordination among workers. ### Use case/motivation _No response_ ### Related issues _No response_ ### Are you willing to submit a PR? - [X] Yes I am willing to submit a PR! GitHub link: https://github.com/apache/cloudberry/discussions/1202 ---- This is an automatically sent email for dev@cloudberry.apache.org. To unsubscribe, please send an email to: dev-unsubscr...@cloudberry.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@cloudberry.apache.org For additional commands, e-mail: dev-h...@cloudberry.apache.org