Parallel aggregate is the feature doing the aggregation job parallel with the help of Gather and partial seq scan nodes. The following is the basic overview of the parallel aggregate changes.
Decision phase: Based on the following conditions, the parallel aggregate plan is generated. - check whether the below plan node is Gather + partial seq scan only. This is because to check whether the plan nodes that are present are aware of parallelism or not? - check Are there any projection or qual condition is present in the Gather node? If there exists any quals and projection info that is required to performed in the Gather node because of the function that can only be executed in master backends, the parallel aggregate plan is not chosen. - check whether the aggregate supports parallelism or not. As for first patch, I thought of supporting only some aggregates for this parallel aggregate. The supported aggregates are mainly the aggregate functions that have variable length data types as final and transition types. This is to avoid changing the target list return types. Because of variable lengths, even the transition type can be returned to backend without applying the final function in aggregate. To identify the supported aggregates for parallelism, a new member is added to pg_aggregate system catalog table. - currently Group and plain aggregates are only supported for simplicity. This patch doesn't change anything in aggregate plan decision. If the planner decides the group or plain aggregates as the best plan, then we will check whether this can be converted into parallel aggregate or not? Planning phase: - Generate the target list items that needs to be passed to the child aggregate nodes, by separting bare aggregate and group by expressions. This is required to take care of any expressions those are involved the target list. Example: Output: (sum(id1)), (3 + (sum((id2 - 3)))), (max(id1)), ((count(id1)) - (max(id1))) -> Aggregate Output: sum(id1), sum((id2 - 3)), max(id1), count(id1) - Don't push the Having clause to the child aggregate node, this needs to be executed at the Gather node only, after combining all results from workers with the matching key, (and also after the final function is called for the aggregate function if exists). - Get the details of the Gather plan and remove its plan node from the actual plan and prepare the Gather plan on top of the aggregate plan. Execution phase: - By passing some execution flag like EXEC_PARALLEL or something, the aggregate operations doesn't do the final function calculation in the worker side. - Set the single_copy mode as true, in case if the below node of Gather is a parallel aggregate. - Add the support of getting a slot from a particular worker. This support is required to merge the slots from different workers based on grouping key. - Merge the slots received from the workers based on the grouping key. If there is no grouping key, then merge all slots without waiting for receiving slots from all workers. - If there exists a grouping key, backend has to wait till it gets slots from all workers who are running. Once all slots are received, they needs to be compared against the grouping key and merged accordingly. The merged slot needs to be processed further to apply the final function, qualification and projection. I will try to provide a POC patch by next commit-fest. Comments? Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers