On Sun, Aug 30, 2020 at 7:26 AM Tom Lane <t...@sss.pgh.pa.us> wrote: > I wrote: > > Back in bd3daddaf232d95b0c9ba6f99b0170a0147dd8af, which introduced > > AlternativeSubPlans, I wrote: > > There is a lot more that could be done based on this infrastructure: in > > particular it's interesting to consider switching to the hash plan if > we start > > out using the non-hashed plan but find a lot more upper rows going by > than we > > expected. I have therefore left some minor inefficiencies in place, > such as > > initializing both subplans even though we will currently only use one. > > > > That commit will be twelve years old come August, and nobody has either > > built anything else atop it or shown any interest in making the plan > choice > > switchable mid-run. So it seems like kind of a failed experiment. > > > > Therefore, I'm considering the idea of ripping out all executor support > > for AlternativeSubPlan and instead having the planner replace an > > AlternativeSubPlan with the desired specific SubPlan somewhere late in > > planning, possibly setrefs.c. > > Here's a proposed patchset for that. This runs with the idea I'd had > that setrefs.c could be smarter than the executor about which plan node > subexpressions will be executed how many times. I did not take it very > far, for fear of adding an undue number of planning cycles, but it's still > better than what we have now. > > For ease of review, 0001 adds the new planner logic, while 0002 removes > the now-dead executor support. > > There's one bit of dead code that I left in place for the moment, which is > ruleutils.c's support for printing AlternativeSubPlans. I'm not sure if > that's worth keeping or not --- it's dead code for normal use, but if > someone tried to use ruleutils.c to print partially-planned expression > trees, maybe there'd be a use for it? > > (It's also arguable that readfuncs.c's support is now dead code, but > I have little interest in stripping that out.) > > regards, tom lane > > Thank you for this code! I still have some confusion about when a SubPlan should be executed when a join is involved. I care about this because this has an impact on when we can get the num_exec for a subplan.
The subplan in a target list, it is executed after the join in my case. The subplan can be execute after the scan of T1(see below example) and it can also be executed after the join. Which one is better depends on which methods make the num_exec smaller. Is it something we already considered? I drill-down to populate_joinrel_with_paths and not find this logic. # explain (costs off) select (select a from t2 where t2.b = t1.b) from t1, t3; QUERY PLAN ------------------------------ Nested Loop -> Seq Scan on t1 -> Materialize -> Seq Scan on t3 SubPlan 1 -> Seq Scan on t2 Filter: (b = t1.b) (7 rows) When the subplan is in a Qual, it is supposed to be executed as soon as possible, The current implementation matches the below cases. So can we say we knows the num_execs of SubPlan just after we plan the dependent rels? (In Q1 below the dependent rel is t1 vs t3, in Q2 it is t1 only) If we can choose a subplan and recost the related path during(not after) creating the best path, will we get better results for some cases (due to the current cost method for AlternativeSubPlan[1])? -- the subplan depends on the result of t1 join t3 # explain (costs off) select t1.* from t1, t3 where t1.a > (select max(a) from t2 where t2.b = t1.b and t2.c = t3.c); QUERY PLAN ----------------------------------------------------- Nested Loop Join Filter: (t1.a > (SubPlan 1)) -> Seq Scan on t1 -> Materialize -> Seq Scan on t3 SubPlan 1 -> Aggregate -> Seq Scan on t2 Filter: ((b = t1.b) AND (c = t3.c)) (9 rows) -- the subplan only depends on t1. # explain (costs off) select t1.* from t1, t3 where t1.a > (select max(a) from t2 where t2.b = t1.b); QUERY PLAN ------------------------------------------------ Nested Loop -> Seq Scan on t3 -> Materialize -> Seq Scan on t1 Filter: (a > (SubPlan 1)) SubPlan 1 -> Aggregate -> Seq Scan on t2 Filter: (b = t1.b) (9 rows) At last, I want to use the commonly used table in src/test/regress/sql/create_table.sql when providing an example, but I always have issues running the create_table.sql which makes me uncomfortable to use that. Am I missing something? CREATE TABLE fail_part PARTITION OF range_parted3 FOR VALUES FROM (1, minvalue) TO (1, maxvalue); psql:src/test/regress/sql/create_table.sql:611: ERROR: partition "fail_part" would overlap partition "part10" CREATE TABLE fail_part PARTITION OF hash_parted2 FOR VALUES WITH (MODULUS 2, REMAINDER 1); psql:src/test/regress/sql/create_table.sql:622: ERROR: partition "fail_part" would overlap partition "h2part_4" [1] https://www.postgresql.org/message-id/07b3fa88-aa4e-2e13-423d-8389eb1712cf%40imap.cc -- Best Regards Andy Fan