On Tue, 22 Aug 2023 at 21:51, Richard Guo <guofengli...@gmail.com> wrote: > Sometimes we may just not generate parameterized nestloop as final plan, > such as when there are no indexes and no lateral references in the > Append/MergeAppend node. In this case I think it would be great if we > can still do some partition running.
(I just read through this thread again to remind myself of where it's at.) Here are my current thoughts: You've done some costing work which will only prefer the part-prune hash join path in very conservative cases. This is to reduce the risk of performance regressions caused by running the pruning code too often in cases where it's less likely to be able to prune any partitions. Now, I'm not saying we shouldn't ever do this pruning hash join stuff, but what I think might be better to do as a first step is to have partitioned tables create a parameterized path on their partition key, and a prefix thereof for RANGE partitioned tables. This would allow parameterized nested loop joins when no index exists on the partition key. Right now you can get a plan that does this if you do: create table p (col int); create table pt (partkey int) partition by list(partkey); create table pt1 partition of pt for values in(1); create table pt2 partition of pt for values in(2); insert into p values(1); insert into pt values(1); explain (analyze, costs off, timing off, summary off) SELECT * FROM p, LATERAL (SELECT * FROM pt WHERE p.col = pt.partkey OFFSET 0); QUERY PLAN ---------------------------------------------------------- Nested Loop (actual rows=0 loops=1) -> Seq Scan on p (actual rows=1 loops=1) -> Append (actual rows=0 loops=1) -> Seq Scan on pt1 pt_1 (actual rows=0 loops=1) Filter: (p.col = partkey) -> Seq Scan on pt2 pt_2 (never executed) Filter: (p.col = partkey) You get the parameterized nested loop. Great! But, as soon as you drop the OFFSET 0, the lateral join will be converted to an inner join and Nested Loop won't look so great when it's not parameterized. explain (analyze, costs off, timing off, summary off) SELECT * FROM p, LATERAL (SELECT * FROM pt WHERE p.col = pt.partkey); QUERY PLAN ---------------------------------------------------------- Hash Join (actual rows=1 loops=1) Hash Cond: (pt.partkey = p.col) -> Append (actual rows=1 loops=1) -> Seq Scan on pt1 pt_1 (actual rows=1 loops=1) -> Seq Scan on pt2 pt_2 (actual rows=0 loops=1) -> Hash (actual rows=1 loops=1) Buckets: 4096 Batches: 2 Memory Usage: 32kB -> Seq Scan on p (actual rows=1 loops=1) Maybe instead of inventing a very pessimistic part prune Hash Join, it might be better to make the above work without the LATERAL + OFFSET 0 by creating the parameterized paths Seq Scan paths. That's going to be an immense help when the non-partitioned relation just has a small number of rows, which I think your costing favoured anyway. What do you think? David