David Rowley <david.row...@2ndquadrant.com> writes: > [ remove_useless_distinct_clauses_v2.patch ]
This is a cute idea, but I'm troubled by a couple of points: 1. Once you don't have all the tlist items shown in DISTINCT, it really is more like DISTINCT ON, seems like. I am not sure it's a good idea to set hasDistinctOn, because that engages some planner behaviors we probably don't want, but I'm also not sure we can get away with just ignoring the difference. As an example, in allpaths.c there are assorted assumptions that having a distinctClause but !hasDistinctOn means all output columns of a subquery are listed in the distinctClause. 2. There's a comment in planner.c to the effect that * When we have DISTINCT ON, we must sort by the more rigorous of * DISTINCT and ORDER BY, else it won't have the desired behavior. * Also, if we do have to do an explicit sort, we might as well use * the more rigorous ordering to avoid a second sort later. (Note * that the parser will have ensured that one clause is a prefix of * the other.) Removing random elements of the distinctClause will break its correspondence with the sortClause, with probably bad results. I do not remember for sure at the moment, but it may be that this correspondence is only important for the case of DISTINCT ON, in which case we could dodge the problem by not applying the optimization unless it's plain DISTINCT. That doesn't help us with point 1 though. BTW, my dictionary says it's "dependent" not "dependant". regards, tom lane