Re: [HACKERS] Removing useless DISTINCT clauses

Tom Lane Tue, 09 Jan 2018 12:27:24 -0800

David Rowley <[email protected]> writes:
> [ remove_useless_distinct_clauses_v2.patch ]


This is a cute idea, but I'm troubled by a couple of points:

1. Once you don't have all the tlist items shown in DISTINCT, it really is
more like DISTINCT ON, seems like.  I am not sure it's a good idea to set
hasDistinctOn, because that engages some planner behaviors we probably
don't want, but I'm also not sure we can get away with just ignoring the
difference.  As an example, in allpaths.c there are assorted assumptions
that having a distinctClause but !hasDistinctOn means all output columns
of a subquery are listed in the distinctClause.

2. There's a comment in planner.c to the effect that

         * When we have DISTINCT ON, we must sort by the more rigorous of
         * DISTINCT and ORDER BY, else it won't have the desired behavior.
         * Also, if we do have to do an explicit sort, we might as well use
         * the more rigorous ordering to avoid a second sort later.  (Note
         * that the parser will have ensured that one clause is a prefix of
         * the other.)

Removing random elements of the distinctClause will break its
correspondence with the sortClause, with probably bad results.

I do not remember for sure at the moment, but it may be that this
correspondence is only important for the case of DISTINCT ON, in which
case we could dodge the problem by not applying the optimization unless
it's plain DISTINCT.  That doesn't help us with point 1 though.

BTW, my dictionary says it's "dependent" not "dependant".

                        regards, tom lane

Re: [HACKERS] Removing useless DISTINCT clauses

Reply via email to