On Thu, Feb 9, 2017 at 12:36 PM, Antonin Houska <a...@cybertec.at> wrote: > When looking at try_partial_hashjoin_path and try_partial_nestloop_path > functions I'm failing to understand the comment "Parameterized partial paths > are not supported.". > > It's clear to me that GatherPath can't have parameters because repeated > execution of parallel plan with adjusted parameters is not supported.
Actually, I think in theory that is fine. You'd start up and shut down workers for every execution, which is likely to stink in terms of performance, but it should work OK. The only problem is that it'll only work if you pass the values of the parameters down to the worker processes, which the code currently doesn't. Bind parameters sent by the user are passed down, but there's no provision to pass down parameters populated at execution time. Amit has written code for that, though, and it's been posted here. It just hasn't gotten into the tree yet for one reason or another. > But the > fact that generic partial path has parameters should not be a problem if those > parameters are satisfied below the nearest GatherPath node higher in the plan > tree. Do I miss anything of the concept? Yes, I think you're missing a key point. A parameterized path would normally be used for a query like small LJ (big1 IJ big2 ON big1.x = big2.x) ON big1.y = small.y. The plan might look like this: Nested Loop Left Join -> Seq Scan on small -> Nested Loop Join -> Index Scan on big1 -> Index Scan on big2 In essence, we're repeating the join between big1 and big2 for every row in small. That seems like a bad strategy until you realize that each join will scan only a tiny fraction of each of those tables. If you couldn't pass a parameter down from the scan of small to the scans on big1 and big2, you'd end up with a plan that scans one or both of those tables in their entirety. Ouch. Now, this plan can be parallelized just fine. The sequential scan on small can be replaced with a Parallel Seq Scan. That works fine, and the planner is already capable of generating such plans. However, at no point in that do you get a parameterized *partial* path. You generate regular old parameterized paths for big1 and big2 and join then to produce a parameterized path for (big1 big2), and then you join that via a nested loop with the non-parameterized partial path for small, and you get a partial but unparameterized path for (small big1 big2). Then you push a Gather node on top and you're done. Suppose we did generate a partial plan for (big1 big2). It would look like this: Nested Loop Join -> Parallel Index Scan on big1 -> Index Scan on big2 Now, how could you join that in a meaningful way to small so as to come up with anything sensible? You really can't. Consider a plan like this: Gather -> Nested Loop Left Join -> Seq Scan on small -> Nested Loop Join -> Partial Index Scan on big1 -> Index Scan on big2 That's clearly nonsense. The partial index scan is supposed to return a subset of the result rows to each worker, but there's no reason the workers have to be basing their work on the same row from 'small'. Which of their values would the parallel index scan supposedly be using? This isn't a matter of some implementation detail that we're currently missing; such a plan is just inherently nonsensical. > Actually this check > > if (!bms_is_subset(inner_paramrels, outer_path->parent->relids)) > return; > > in try_partial_nestloop_path only rejects special case where the inner path > references relations outside the join, but still allows the outer path to have > parameters outside. Right. We build a partial join path by joining a partial path on the outer side with a non-partial path on the inner side. If the join is a nested loop, the inner side can be parameterized, but all of those parameters have to be provided by the outer side; if not, we get the nonsensical situation illustrated above. > As for try_partial_hashjoin_path, the comment "If the inner path is > parameterized ..." seems to be copy & pasted from try_partial_nestloop_path, > but I think it does not fit hash join. From hash_inner_and_outer I judge that > neither side of hash join can be parameterized by the other: > > /* > * If either cheapest-total path is parameterized by the other rel, we > * can't use a hashjoin. (There's no use looking for alternative > * input paths, since these should already be the least-parameterized > * available paths.) > */ > if (PATH_PARAM_BY_REL(cheapest_total_outer, innerrel) || > PATH_PARAM_BY_REL(cheapest_total_inner, outerrel)) > return; > > Shouldn't try_partial_hashjoin_path and try_partial_nestloop_path do just the > same checks that try_hashjoin_path and try_nestloop_path do respectively? No, because there is no point in producing parameterized partial paths for the reasons mentioned above. I think you're right that the comment in try_partial_hashjoin_path is not quite right. What it should really be saying is that a hash join between a partial path and a parameterized path is bound to produce a parameterized partial path, and those aren't useful for anything, so we shouldn't create them. Maybe someday somebody will figure out a context in which a partial parameterized path is actually good for something, but until then we shouldn't generate them. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers