Hi Tom, Thanks for looking at this.
On 10 January 2018 at 09:26, Tom Lane <t...@sss.pgh.pa.us> wrote: > This is a cute idea, but I'm troubled by a couple of points: > > 1. Once you don't have all the tlist items shown in DISTINCT, it really is > more like DISTINCT ON, seems like. I am not sure it's a good idea to set > hasDistinctOn, because that engages some planner behaviors we probably > don't want, but I'm also not sure we can get away with just ignoring the > difference. As an example, in allpaths.c there are assorted assumptions > that having a distinctClause but !hasDistinctOn means all output columns > of a subquery are listed in the distinctClause. hmm, I see: /* * If subquery has regular DISTINCT (not DISTINCT ON), we're wasting our * time: all its output columns must be used in the distinctClause. */ if (subquery->distinctClause && !subquery->hasDistinctOn) return; I think that particular one would just fail to remove columns from the subquery if there's a distinct clause (not distinct on). That seems more like a missed optimisation rather than a future bug. Although it probably would be a good idea to get rid of the assumption that a non-NIL distinctClause && !hasDistinctOn means that all output target items are part of that distinct clause. I don't see anything else in allpaths.c that would be affected. Keeping in mind we'll never completely remove a distinctClause, just possibly remove some items from the list. Everything else just seems to be checking distinctClause != NIL, or is only doing something if hasDistinctOn == true. I'll do some more analysis on places that distinctClause is being used to check what's safe. > 2. There's a comment in planner.c to the effect that > > * When we have DISTINCT ON, we must sort by the more rigorous of > * DISTINCT and ORDER BY, else it won't have the desired behavior. > * Also, if we do have to do an explicit sort, we might as well use > * the more rigorous ordering to avoid a second sort later. (Note > * that the parser will have ensured that one clause is a prefix of > * the other.) > > Removing random elements of the distinctClause will break its > correspondence with the sortClause, with probably bad results. > > I do not remember for sure at the moment, but it may be that this > correspondence is only important for the case of DISTINCT ON, in which > case we could dodge the problem by not applying the optimization unless > it's plain DISTINCT. That doesn't help us with point 1 though. Yeah, it seems better to only try and apply this for plain DISTINCT. > BTW, my dictionary says it's "dependent" not "dependant". Oops. Thanks. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services