On 2016-05-25 16:55:23 -0400, Tom Lane wrote: > Andres Freund <and...@anarazel.de> writes: > > On 2016-05-25 15:20:03 -0400, Tom Lane wrote: > >> We could certainly make a variant behavior in nodeFunctionscan.c that > >> emulates that, if we feel that being exactly bug-compatible on the point > >> is actually what we want. I'm dubious about that though, not least > >> because I don't think *anyone* actually believes that that behavior isn't > >> broken. Did you read my upthread message suggesting assorted compromise > >> choices? > > > You mean > > https://www.postgresql.org/message-id/21076.1464034...@sss.pgh.pa.us ? > > If so, yes. > > > If we go with rewriting this into LATERAL, I'd vote for 2.5 (trailed by > > option 1), that'd keep most of the functionality, and would break > > visibly rather than invisibly in the cases where not. > > 2.5 would be okay with me. > > > I guess you're not planning to work on this? > > Well, not right now, as it's clearly too late for 9.6. I might hack on > it later if nobody beats me to it.
FWIW, as it's blocking my plans for executor related rework (expression evaluation, batch processing) I started to hack on this. I've an implementation that 1) turns all targetlist SRF (tSRF from now on) into ROWS FROM expressions. If there's tSRFs in the argument of a tSRF those becomes a separate, lateral, ROWS FROM expression. 2) If grouping/window functions are present, the entire query is wrapped in a subquery RTE, except for the set-returning function. All referenced Var|Aggref|GroupingFunc|WindowFunc|Param nodes in the original targetlist are made to reference that subquery, which gets a TargetEntry for them. 3) If sortClause does *not* reference any tSRFs the sorting is evaluated in a subquery, to preserve the output ordering of SRFs in queries like SELECT id, generate_series(1,3) FROM (VALUES(1),(2)) d(id) ORDER BY id DESC; if in contrast sortClause does reference the tSRF output, it's evaluated in the outer SRF. this seems to generally work, and allows to remove considerable amounts of code. So far I have one problem without an easy solution: Historically queries like =# SELECT id, generate_series(1,2) FROM (VALUES(1),(2)) few(id); ┌────┬─────────────────┐ │ id │ generate_series │ ├────┼─────────────────┤ │ 1 │ 1 │ │ 1 │ 2 │ │ 2 │ 1 │ │ 2 │ 2 │ └────┴─────────────────┘ have preserved the SRF output ordering. But by turning the SRF into a ROWS FROM, there's no guarantee that the cross join between "few" and generate_series(1,3) above is implemented in that order. I.e. we can get something like ┌────┬─────────────────┐ │ id │ generate_series │ ├────┼─────────────────┤ │ 1 │ 1 │ │ 2 │ 1 │ │ 1 │ 2 │ │ 2 │ 2 │ └────┴─────────────────┘ because it's implemented as ┌──────────────────────────────────────────────────────────────────────────────┐ │ QUERY PLAN │ ├──────────────────────────────────────────────────────────────────────────────┤ │ Nested Loop (cost=0.00..35.03 rows=2000 width=8) │ │ -> Function Scan on generate_series (cost=0.00..10.00 rows=1000 width=4) │ │ -> Materialize (cost=0.00..0.04 rows=2 width=4) │ │ -> Values Scan on "*VALUES*" (cost=0.00..0.03 rows=2 width=4) │ └──────────────────────────────────────────────────────────────────────────────┘ I right now see no easy and nice-ish way to constrain that. Besides that I'm structurally wondering whether turning the original query into a subquery is the right thing to do. It requires some kind of ugly munching of Query->*, and has the above problem. One alternative would be to instead perform the necessary magic in grouping_planner(), by "manually" adding nestloop joins before/after create_ordered_paths() (depending on SRFs being referenced in the sort clause). That'd create plans we'd not have created so far, by layering NestLoop and FunctionScan nodes above the normal query - that'd allow us to to easily force the ordering of SRF evaluation. If we go the subquery route, I'm wondering about where to tackle the restructuring. So far I'm doing it very early in subquery_planner() - otherwise the aggregation/sorting/... behaviour is easier to handle. Perhaps doing it in standard_planner() itself would be better though. An alternative approach would be to do this during parse-analysis, but I think that might end up being confusing, because stored rules would suddenly have a noticeably different structure, and it'd tie us a lot more to the details of that transformation than I'd like. Besides the removal of the least-common-multiple behaviour of tSRF queries, there's some other consequences that using function scans have: Previously if a tSRF was never evaluated, it didn't cause the number of rows from being increased. E.g. SELECT id, COALESCE(1, generate_series(1,2)) FROM (VALUES(1),(2)) few(id); only produced two rows. But using joins means that a simple implementation of using ROWS FROM returns four rows. We could try to inject sufficient join conditions in that type of query, to prune down the number of rows again, but I really don't want to go there - it's kinda hard in the general case... Comments? Regards, Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers