Re: [HACKERS] Removing useless DISTINCT clauses

David Rowley Tue, 09 Jan 2018 14:13:02 -0800

Hi Tom,

Thanks for looking at this.

On 10 January 2018 at 09:26, Tom Lane <t...@sss.pgh.pa.us> wrote:
> This is a cute idea, but I'm troubled by a couple of points:
>
> 1. Once you don't have all the tlist items shown in DISTINCT, it really is
> more like DISTINCT ON, seems like.  I am not sure it's a good idea to set
> hasDistinctOn, because that engages some planner behaviors we probably
> don't want, but I'm also not sure we can get away with just ignoring the
> difference.  As an example, in allpaths.c there are assorted assumptions
> that having a distinctClause but !hasDistinctOn means all output columns
> of a subquery are listed in the distinctClause.

hmm, I see:

/*
* If subquery has regular DISTINCT (not DISTINCT ON), we're wasting our
* time: all its output columns must be used in the distinctClause.
*/
if (subquery->distinctClause && !subquery->hasDistinctOn)
return;

I think that particular one would just fail to remove columns from the
subquery if there's a distinct clause (not distinct on). That seems
more like a missed optimisation rather than a future bug. Although it
probably would be a good idea to get rid of the assumption that a
non-NIL distinctClause && !hasDistinctOn means that all output target
items are part of that distinct clause.

I don't see anything else in allpaths.c that would be affected.
Keeping in mind we'll never completely remove a distinctClause, just
possibly remove some items from the list. Everything else just seems
to be checking distinctClause != NIL, or is only doing something if
hasDistinctOn == true.

I'll do some more analysis on places that distinctClause is being used
to check what's safe.

> 2. There's a comment in planner.c to the effect that
>
>          * When we have DISTINCT ON, we must sort by the more rigorous of
>          * DISTINCT and ORDER BY, else it won't have the desired behavior.
>          * Also, if we do have to do an explicit sort, we might as well use
>          * the more rigorous ordering to avoid a second sort later.  (Note
>          * that the parser will have ensured that one clause is a prefix of
>          * the other.)
>
> Removing random elements of the distinctClause will break its
> correspondence with the sortClause, with probably bad results.
>
> I do not remember for sure at the moment, but it may be that this
> correspondence is only important for the case of DISTINCT ON, in which
> case we could dodge the problem by not applying the optimization unless
> it's plain DISTINCT.  That doesn't help us with point 1 though.

Yeah, it seems better to only try and apply this for plain DISTINCT.

> BTW, my dictionary says it's "dependent" not "dependant".

Oops. Thanks.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Removing useless DISTINCT clauses

Reply via email to