I happened to notice that the query below can be inefficient. # explain (costs off) select * from int8_tbl a left join (int8_tbl b inner join lateral (select *, b.q2 as x from int8_tbl c) ss on b.q2 = ss.q1) on a.q1 = b.q1; QUERY PLAN ------------------------------------ Hash Right Join Hash Cond: (b.q1 = a.q1) -> Nested Loop -> Seq Scan on int8_tbl b -> Seq Scan on int8_tbl c Filter: (b.q2 = q1) -> Hash -> Seq Scan on int8_tbl a (8 rows)
For B/C join, currently we only have one option, i.e., nestloop with parameterized inner path. This could be extremely inefficient in some cases, such as when C does not have any indexes, or when B is very large. I believe the B/C join can actually be performed with hashjoin or mergejoin here, as it is an inner join. This happens because when we pull up the lateral subquery, we notice that Var 'x' is a lateral reference to 'b.q2' which is outside the subquery. So we wrap it in a PlaceHolderVar. This is necessary for correctness if it is a lateral reference from nullable side to non-nullable item. But here in this case, the referenced item is also nullable, so actually we can omit the PlaceHolderVar with no harm. The comment in pullup_replace_vars_callback() also explains this point. * (Even then, we could omit the PlaceHolderVar if * the referenced rel is under the same lowest outer join, but * it doesn't seem worth the trouble to check that.) All such PHVs would imply lateral dependencies which would make us have no choice but nestloop. I think we should avoid such PHVs as much as possible. So IMO it may 'worth the trouble to check that'. Attached is a patch to check that for simple Vars. Maybe we can extend it to avoid PHVs for more complex expressions, but that requires some codes because for now we always wrap non-var expressions to PHVs in order to have a place to insert nulling bitmap. As a first step, let's do it for simple Vars only. Any thoughts? Thanks Richard
v1-0001-Avoid-unnecessary-PlaceHolderVars-for-simple-Vars.patch
Description: Binary data