I've been thinking about how to improve the planner's poor handling of variables in outer-join situations. Here are some past examples for motivation:
http://archives.postgresql.org/pgsql-hackers/2006-02/msg00154.php http://archives.postgresql.org/pgsql-general/2008-03/msg01440.php The reason why the planner acts so stupidly in these examples is that we're still using a kluge solution for this old bug: http://archives.postgresql.org/pgsql-bugs/2001-04/msg00223.php The root of the problem is that the planner does not worry about computing output expressions until the top of a plan tree. All lower-level join nodes are made to output simple lists of Vars referencing columns of the base relations of the join. We handle outer-join cases by forcing the values of the Vars of the nullable side to null at the level of the join, whenever there's no matching row in the nullable side. If one of the base relations of the join is a sub-SELECT whose output list includes expressions that don't certainly go to null when the input variables are forced to null, then we can't flatten that sub-SELECT, because flattening the sub-SELECT means that the expression evaluations bubble to the top of the plan tree and can produce non-null results when they shouldn't (as happened in the above bug, before we realized that we had to prevent flattening in this case). Another problem with this approach is that depending on what level of the plan tree you are thinking about, a Var nominally referencing tab.col might really mean the value of tab.col, or it might mean "either tab.col or NULL depending on what happened at some lower level of outer join". Since we can't readily tell the difference, we have estimation errors arising from failure to expect some NULLs (there have been recent complaints about this), and we need some pretty ugly kluges in places like EquivalenceClass processing to handle the risk that apparently identical expressions might not really be equal. I think the basic solution for this is that upper levels of the plan tree should refer to the nullable output columns of an outer join using "alias Vars" that name the join rel, not the underlying base relation, even if there is a simple base-relation Var that the alias is tracking. In the case involving a sub-SELECT, the alias Var would stand for whatever output expression appears in the sub-SELECT. We already have the concept of these alias Vars, in fact --- that's exactly the representation emitted by the parser. But historically the planner has smashed aliases down to their base Vars as early as possible (see flatten_join_alias_vars). That has some advantages but I'm thinking it's outweighed by the disadvantages. I'd like to try leaving alias Vars as aliases all the way through the planner, in any case where they might be semantically different from their referent (ie, whenever there's a possible force-to-null involved). To make this work, we'd need to have the constructed plan tree compute the alias Var from its referent expression at the lowest outer-join that can null the alias Var. The trick for the executor is to know when to force the value to null instead of computing the expression. I first thought about marking entries of the join node's targetlist as to be forced to null if left or right input row is null. However, that fails if we want the join node to compute some projection expressions on top of the raw join output (as would certainly happen if it were the top node of the tree, for example). That could be handled by inserting another level of plan node (ie, a Result) to do the projection, but that seems a pretty ugly and inefficient solution. What I have in mind instead is to insert a new kind of expression node "ForceToNull" atop the referent expression, with this node able to look at the EState (in the same way a regular Var node would) to see if it should return a null instead of computing its child expression. Then expansion of an alias Var into a ForceToNull and the underlying expression would work. I'm envisioning keeping track of active alias Vars and their expansions in a new list attached to the PlannerInfo "root" node. This would provide a place to record important information like which level of the join tree such a Var needs to be evaluated at. This is all pretty handwavy yet, but I don't think I'll be able to fill in many more details until I try to code it. I thought I'd put up this summary to see if anyone can shoot holes in it at this level of detail ... Comments? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers