On Wed, Nov 22, 2017 at 10:30 PM, Ashutosh Bapat <ashutosh.ba...@enterprisedb.com> wrote: > On Wed, Nov 1, 2017 at 5:39 AM, David Rowley > <david.row...@2ndquadrant.com> wrote: > >> In this case, the join *can* cause row duplicates, but the distinct or >> group by would filter these out again anyway, so in these cases, we'd >> not only get the benefit of not joining but also not having to remove >> the duplicate rows caused by the join. > > +1. > >> >> Given how simple the code is to support this, it seems to me to be >> worth handling. >> > > I find this patch very simple and still useful. > > @@ -597,15 +615,25 @@ rel_supports_distinctness(PlannerInfo *root, > RelOptInfo *rel) > + if (root->parse->distinctClause != NIL) > + return true; > + > + if (root->parse->groupClause != NIL && !root->parse->hasAggs) > + return true; > + > > The other callers of rel_supports_distinctness() are looking for distinctness > of the given relation, whereas the code change here are applicable to any > relation, not just the given relation. I find that confusing. Looking at the > way we are calling rel_supports_distinctness() in join_is_removable() this > change looks inevitable, but still if we could reduce the confusion, that will > be good. Also if we could avoid duplication of comment about unique index, > that > will be good. > > DISTINCT ON allows only a subset of columns being selected to be listed in > that > clause. I initially thought that specifying only a subset would be a problem > and we should check whether the DISTINCT applies to all columns being > selected. > But that's not true, since DISTINCT ON would eliminate any duplicates in the > columns listed in that clause, effectively deduplicating the row being > selected. So, we are good there. May be you want to add a testcase with > DISTINCT ON.
I am counting that as a review, which got no replies yet. The thing is somewhat fresh so I am moving it to next CF with waiting on author as status. -- Michael