2009/11/2 Tom Lane <t...@sss.pgh.pa.us>: > One of the interesting properties of Oracle-compatible variable > references in plpgsql is that the set of variables referenced by a > given query could change during a forced replan. For example, > consider > > declare x int; > r record; > ... > for r in select x,y from tab loop ... > > If tab contains a column "x" then the "x" reference in the SELECT > refers to tab.x; if not, it refers to the plpgsql variable x. > So when first executing the SELECT we might find that it requires > a Param reference to the plpgsql variable, and then after a replan > is forced by ALTER TABLE tab ADD COLUMN x, there is no need for > the Param anymore. Or vice versa. > > This kinda calls into question whether the Oracle way is actually > a good idea or not; but my purpose here is not to debate that, > just to look at what it takes to implement it.
This is reason, why I would to see third mode (incompatible with Oracle or pg), that raise error, when it detects any intersecting identifiers. I understand so this mode should not be default, but personally I'll use it everywhere. Pavel > > Currently, plpgsql generates a list of the variables referenced by > any SQL statement or expression immediately upon seeing the text, > before it's ever even fed to the core parser. I had been envisioning > having the parser callback hook construct the list on-the-fly during > parsing, but the possibility that the list will change from time to > time means that other changes are needed too. Notably: > > 1. plancache.c does not have any provision for letting the Param type > array associated with a stored statement change when the statement is > replanned due to SI invalidation. > > 2. The control flow for a replan is that plpgsql calls SPI_execute_plan, > which calls RevalidateCachedPlan, which does the replan if the cached > plan is discovered to be stale. However, plpgsql already had to set up > the list of actual parameter values before it called SPI_execute_plan, > which means it is *way* too late to change the list of required Params > even if plancache let us do it. > > After chewing on these facts for awhile, I am thinking that the best > solution is for plpgsql to abandon the notion of a predetermined list > of parameters for a SQL query altogether. What that list basically > provides is a mapping from Param numbers ($n) to plpgsql "datum numbers" > (indexes in the list of a plpgsql function's variables). We could make > that mapping always be one-to-one, since there's no real reason that the > Params available to a query have to be consecutively numbered. So the > transformColumnRef hook would just pass back a Param using the > referenced variable's datum number as paramid; it wouldn't bother at all > with building a data structure listing the specific variables actually > used in the query. > > As far as plancache goes, it would therefore always see a null array > of Param type OIDs associated with a plpgsql-generated query, and we'd > not have to provide a way to update that. (We'd still keep the ability > to store such an array, because most other callers of plancache will > still want a fixed list of Params.) What we'd have to add to plancache > instead is the ability to install caller-determined parser callback > hooks when it is calling the parser for a replan. This seems fairly > easy to do --- I'm envisioning a sort of meta-hook function that gets > called with the new ParseState and can insert hook function pointers > in it. > > The other issue with this is what to do at runtime. We could do it > with no other changes if we had plpgsql always set up Values/Nulls > arrays listing *every* datum's current value. This seems a bit > brute-force though --- it could be slow in a function with a lot of > variables, and in most cases any specific query or expression would > not need most of those values. What I think we should do instead > is extend the ParamListInfo structure to add a callback hook function > that populates individual ParamExternData array entries on-demand. > The core executor would call the hook when it tried to fetch the > value of a Param that was currently invalid (ptype == 0). So the > hook would be invoked only once per query per referenced parameter, > which shouldn't be much overhead. Another interesting property > of this approach is that it'd fix the longstanding user complaint > that constructions like > if (TG_OP = 'INSERT' and NEW.foo = 'bar') ... > fail prematurely. The executor would never demand the value > of NEW.foo, and thus not fail, if TG_OP isn't INSERT. > > Comments? > > regards, tom lane > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers