On Tue, May 15, 2018 at 2:05 AM, Bruce Momjian <br...@momjian.us> wrote: > On Sun, May 13, 2018 at 04:40:19PM +0530, Amit Kapila wrote: >> On Fri, May 11, 2018 at 8:38 PM, Bruce Momjian <br...@momjian.us> wrote: >> > I have committed the first draft of the Postgres 11 release notes. I >> > will add more markup soon. You can view the most current version here: >> > >> > http://momjian.us/pgsql_docs/release-11.html >> > >> > I expect a torrent of feedback. ;-) >> > >> >> Thanks for compiling the first draft. I think there are few things >> which we should add to the Parallel Queries section. (a) commit >> e89a71fb449af2ef74f47be1175f99956cf21524 - >> Pass InitPlan values to workers via Gather (Merge). We can write >> something like "Allow the part of the plan that references an initplan >> to run in parallel" or "Parallelise queries containing initplans", > > Uh, I need some explaination of what an initplan is >
See the below comment in the code: " This module is concerned with executing SubPlan expression nodes, which should not be confused with sub-SELECTs appearing in FROM. SubPlans are divided into "initplans", which are those that need only one evaluation per query (among other restrictions, this requires that they don't use any direct correlation variables from the parent plan level), and "regular" subplans, which are re-evaluated every time their result is required. > and what type of > query this helps. > Serial-Plan ----------------- postgres=# explain select * from t1 where t1.k=(select max(k) from t3); QUERY PLAN -------------------------------------------------------------------- Seq Scan on t1 (cost=35.51..71.01 rows=10 width=12) Filter: (k = $0) InitPlan 1 (returns $0) -> Aggregate (cost=35.50..35.51 rows=1 width=4) -> Seq Scan on t3 (cost=0.00..30.40 rows=2040 width=4) (5 rows) Parallel-Plan -------------------- postgres=# explain select * from t1 where t1.k=(select max(k) from t3); QUERY PLAN --------------------------------------------------------------------------------------- Gather (cost=9.71..19.38 rows=2 width=12) Workers Planned: 2 Params Evaluated: $1 InitPlan 1 (returns $1) -> Finalize Aggregate (cost=9.70..9.71 rows=1 width=4) -> Gather (cost=9.69..9.70 rows=2 width=4) Workers Planned: 2 -> Partial Aggregate (cost=9.69..9.70 rows=1 width=4) -> Parallel Seq Scan on t3 (cost=0.00..8.75 rows=375 width=4) -> Parallel Seq Scan on t1 (cost=0.00..9.67 rows=1 width=12) Filter: (k = $1) (11 rows) Here, you can observe that initplan itself gets parallelized and the part of plan tree referring initplan also got parallelized. The other example from regression test is: explain (costs off) select count(*) from tenk1 where tenk1.unique1 = (Select max(tenk2.unique1) from tenk2); QUERY PLAN ------------------------------------------------------ Aggregate InitPlan 1 (returns $2) -> Finalize Aggregate -> Gather Workers Planned: 2 -> Partial Aggregate -> Parallel Seq Scan on tenk2 -> Gather Workers Planned: 4 Params Evaluated: $2 -> Parallel Seq Scan on tenk1 Filter: (unique1 = $2) (12 rows) I can share more examples if required. >> (b) "Parallelise queries that contains expensive functions in target >> lists" (Commits - 3f90ec8597c3515e0d3190613b31491686027e4b and >> 11cf92f6e2e13c0a6e3f98be3e629e6bd90b74d5). > > So we generate the entire query before the target list function calls, > the run another parallel run just for those function calls? > No, it is not like that. We divide the scan among workers and each worker should perform projection of the rows it scanned (after applying filter). Now, if the expensive functions are part of target lists, then we can push the computation of expensive functions (as part of target list) in workers which will divide the work. > Really? Do > we run each column in its own worker or do we split the result set into > parts and run those in parallel? How do we know, just the function call > costs? > The function's cost can be determined via pg_proc->procost. For this particular case, you can refer the call graph - create_pathtarget->set_pathtarget_cost_width->cost_qual_eval_node->cost_qual_eval_walker->get_func_cost > I can admit I never saw that coming. > I think the use case becomes interesting with parallel query because now you can divide such cost among workers. Feel free to ask more questions if above doesn't clarify the usage of these features. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com