I wrote: > I still think this stuff mostly needs to be thrown away and rewritten > in Path-creation style, but that's a long-term project. In the meantime > this seems like a relatively small increment of complexity, so maybe it's > worth applying. I'm concerned about the method for building a new > DISTINCT clause though; the best that can be said for that is that it's > a crude hack, and I'm less than convinced that there are no cases where > it'll dump core.
OK, after still more time thinking about it, here's the issues with that way of generating a DISTINCT clause corresponding to the UNION: 1. Calling transformDistinctClause with a NULL pstate seems pretty unsafe. It might accidentally fail to fail, today, but it's surely fragile. It's violating the API contract not only of transformDistinctClause itself (which is not documented as accepting a NULL pstate) but of addTargetToGroupList (q.v.). A minimum requirement before I'd accept a patch that did that is that it extended the header comments of those functions to specify under what circumstances a NULL pstate can be passed. However, that's not the direction to go, because of #2. 2. This approach potentially changes the semantics of the UNION. (This is only important for a datatype that has more than one btree equality operator, but let's posit that.) transformDistinctClause, as per its header comment, allows the outer ORDER BY to influence which equality operator it picks for DISTINCT. However, in the case of "(SELECT ... UNION SELECT ...) ORDER BY ...", the UNION semantics were chosen without reference to the ORDER BY, so it's possible that the equality operators cited in the SetOperationStmt's groupClauses list are not what you'd get from applying transformDistinctClause as the patch does. It is not okay for the planner to override the parser's choice of semantics like that. Now I'm fairly sure that planner.c is expecting that the query's distinctClause be a superset of the sortClause if any (cf comments for SortGroupClause in parsenodes.h), so it wouldn't be okay to just blindly build a distinctClause from topop->groupClauses. I think what you need to do is check that topop->groupClauses is compatible with the sortClause if any (skipping the flattening transformation if not) and then build a distinctClause by extending the sort clause list with any missing items from topop->groupClauses. So this is sort of like what transformDistinctClause does, but different in detail, and the failure case is to not do the transformation, rather than ereport'ing. (See also generate_setop_grouplist, which you could almost use except that it's not expecting to have to merge with a sortClause list.) So this is all doable enough, but you're probably going to end up with a new distinctClause-generating function that's at least twice the size of the patch's former net code addition. So the question is whether it's worth the trouble, considering that all this code has (I hope) a life expectancy of no more than one or two more release cycles. I'm going to set the patch back to Waiting on Author, but unless you are prepared to rewrite the distinctClause creation code in the next couple of days, you should change it to Returned with Feedback. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers