Greetings, * David Rowley (david.row...@2ndquadrant.com) wrote: > On 24 August 2018 at 11:34, Stephen Frost <sfr...@snowman.net> wrote: > > * David Rowley (david.row...@2ndquadrant.com) wrote: > >> My personal opinion of only being able to completely remove the > >> DISTINCT when there's a single item in the rtable (or a single base > >> table) is that it's just too poor to bother with. I think such a > >> solution is best left to another patch and it should be much more > >> complete and be able to track the unique properties through joins. > > > > Doesn't that move the goalposts for this pretty far off? Is there > > some reason that doing this for the single-item case will make it > > difficult to implement the additional improvements you're suggesting > > later? > > The DISTINCT clause removal would likely need to wait until around > where the distinct paths are created and all the join processing has > been done. > > If the optimisation is deemed worthwhile, then maybe that would be the > place to put this code along with a comment that explains how it can > be improved to support the removal with multiple relations.
Hm, so you're suggesting that this isn't the right place for this optimization to be implemented, even now, with the single-relation caveat? I do agree that including comments about how it could/should be improved would be good for future developers. > > Given the cost of a DISTINCT and that we know pretty easily if one > > exists or not, trying to eliminate it, even in the single-item case, > > seems pretty worthwhile to me. If it makes it difficult to improve on > > this later then I could see pushing back on it, but this patch doesn't > > seem like its doing that. > > If people truly think it's worthwhile, then maybe it is, but if it's > being done because it can be done then perhaps it's best not to > bother. I was just thinking of the bug reports we'd get when people > see it does not work with multiple relations. There's already no > shortage of bug reports that are not bugs. I also wanted to keep this > patch just doing the simple case that we already apply to GROUP BY, > but control over the patch may well be out of my hands now. I'm not really sure I'm following along this part of the discussion. As long as it isn't making the code particularly ugly or violating the seperation of various components or making later hacking in this area particularly worse, then the question primairly comes down to making sure that the code is of quality and that the performance optimization is worth the cycles to check for the opportunity and that it's not too expensive compared to the cost of the operation to implement it, all of which can be pretty objectively assessed and provide the answer to if it's worthwhile or not. Given the cost of a Unique node, and that we will only check for this optimization when we are going to have to introduce one, it certainly seems like it's worthwhile (again, proviso on code quality, et al, above, but I don't think that's what you were arguing a concern about here). In short, I don't buy the concern about having to deal with "bug reports that aren't bug reports" or that they should be a barrier to feature work. We already see complaints pretty regularly about optimizations we don't have and this would at least be a step in the right direction for dealing with these kinds of queries and that does seem better than doing nothing, imv. Thanks! Stephen
signature.asc
Description: PGP signature