Re: Removing useless DISTINCT clauses

Stephen Frost Thu, 23 Aug 2018 17:30:57 -0700

Greetings,

* David Rowley (david.row...@2ndquadrant.com) wrote:
> On 24 August 2018 at 11:34, Stephen Frost <sfr...@snowman.net> wrote:
> > * David Rowley (david.row...@2ndquadrant.com) wrote:
> >> My personal opinion of only being able to completely remove the
> >> DISTINCT when there's a single item in the rtable (or a single base
> >> table) is that it's just too poor to bother with.  I think such a
> >> solution is best left to another patch and it should be much more
> >> complete and be able to track the unique properties through joins.
> >
> > Doesn't that move the goalposts for this pretty far off?  Is there
> > some reason that doing this for the single-item case will make it
> > difficult to implement the additional improvements you're suggesting
> > later?
> 
> The DISTINCT clause removal would likely need to wait until around
> where the distinct paths are created and all the join processing has
> been done.
> 
> If the optimisation is deemed worthwhile, then maybe that would be the
> place to put this code along with a comment that explains how it can
> be improved to support the removal with multiple relations.


Hm, so you're suggesting that this isn't the right place for this
optimization to be implemented, even now, with the single-relation
caveat?

I do agree that including comments about how it could/should be improved
would be good for future developers.

> > Given the cost of a DISTINCT and that we know pretty easily if one
> > exists or not, trying to eliminate it, even in the single-item case,
> > seems pretty worthwhile to me.  If it makes it difficult to improve on
> > this later then I could see pushing back on it, but this patch doesn't
> > seem like its doing that.
> 
> If people truly think it's worthwhile, then maybe it is, but if it's
> being done because it can be done then perhaps it's best not to
> bother.  I was just thinking of the bug reports we'd get when people
> see it does not work with multiple relations. There's already no
> shortage of bug reports that are not bugs. I also wanted to keep this
> patch just doing the simple case that we already apply to GROUP BY,
> but control over the patch may well be out of my hands now.

I'm not really sure I'm following along this part of the discussion.
As long as it isn't making the code particularly ugly or violating the
seperation of various components or making later hacking in this area
particularly worse, then the question primairly comes down to making
sure that the code is of quality and that the performance optimization
is worth the cycles to check for the opportunity and that it's not too
expensive compared to the cost of the operation to implement it, all of
which can be pretty objectively assessed and provide the answer to if
it's worthwhile or not.  Given the cost of a Unique node, and that we
will only check for this optimization when we are going to have to
introduce one, it certainly seems like it's worthwhile (again, proviso
on code quality, et al, above, but I don't think that's what you were
arguing a concern about here).

In short, I don't buy the concern about having to deal with "bug reports
that aren't bug reports" or that they should be a barrier to feature
work.  We already see complaints pretty regularly about optimizations we
don't have and this would at least be a step in the right direction for
dealing with these kinds of queries and that does seem better than doing
nothing, imv.

Thanks!

Stephen

signature.asc
Description: PGP signature

Re: Removing useless DISTINCT clauses

Reply via email to