By seeing the subject line, one might wonder why we need to consider parallelizing the queries containing initplans differently from queries containing subplans considering that I have posted a mail to achieve later a few hours back. The reason is that both are treated differently with respect to parallelism and otherwise as well and both can be parallelized in a different way depending on the design we choose. InitPlans can be used in three forms (a) a Param node representing a single scalar result (b) a row comparison tree containing multiple Param nodes (c) NULL constant for MULTIEXPR subquery whereas SubPlans are used as SubPlan nodes. Here, I am primarily interested in parallelizing queries that contain InitPlans of the form (a) and the reason is that I have seen that form used more as compared to other forms (primarily based on a study of TPC-H and TPC-DS workloads). However, if we find that parallelizing other forms can be done along with it easily, then that is excellent. To start with let us see the plan of TPC-H query (Q-22) and understand how it can be improved.
Limit InitPlan 1 (returns $0) -> Finalize Aggregate -> Gather Workers Planned: 2 -> Partial Aggregate -> Parallel Seq Scan on customer customer_1 Filter: (...) -> GroupAggregate Group Key: ("substring"((customer.c_phone)::text, 1, 2)) -> Sort Sort Key: ("substring"((customer.c_phone)::text, 1, 2)) -> Nested Loop Anti Join -> Seq Scan on customer Filter: ((c_acctbal > $0) AND (...))) -> Index Only Scan using idx_orders_custkey on orders Index Cond: (o_custkey = customer.c_custkey) In the above plan, we can see that the join on customer and orders table (Nested Loop Anti Join) is not parallelised even though we have the capability to parallelize Nested Loop Joins. The reason for not choosing the parallel plan is that one of the nodes (Seq Scan on customer) is referring to initplan and we consider such nodes as parallel-restricted which means they can't be parallelised. Now, I could see three ways of parallelizing such a query. The first way is that we just push parallel-safe initplans to workers and allow them to execute it, the drawback of this approach is that it won't be able to push initplans in cases as shown above where initplan is parallel-unsafe (contains Gather node) and second is we will lose the expectation of single evaluation. The second way is that we always execute the initplan in the master backend and pass the resultant value to the worker, this will allow above form of plans to push initplans to workers and hence can help in enabling parallelism for other nodes in plan tree. The drawback of the second approach is that we need to evaluate the initplan before it is actually required which means that we might evaluate it even when it is not required. I am not sure if it is always safe to assume that we can evaluate the initplan before pushing it to workers especially for the cases when it is far enough down in the plan tree which we are parallelizing, however, I think we can assume it when the iniplan is above the plan tree where it is used (like in the above case). The third way is that we allow Gather node to be executed below another Gather node, but I think that will be bad especially for the plans like above because each worker needs to further spawn another set of workers to evaluate the iniplan which could be done once. Now we can build some way such that only one of the workers executes such an initplan and share the values with other workers, but I think overall this requires much more effort than first or second approach. Among all the three approaches, first seems to be simpler than the other two, but I feel if we just do that then we leave a lot on the table. Another way to accomplish this project could be that we do a mix of first and second such that when the initplan is above the plan tree to be parallelized, then use the second approach (one-time evaluation by master backend and share the result with workers), otherwise use the first approach of pushing down the initplan to workers. Thoughts? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers