On Wed, Nov 1, 2017 at 5:39 AM, David Rowley <david.row...@2ndquadrant.com> wrote:
> In this case, the join *can* cause row duplicates, but the distinct or > group by would filter these out again anyway, so in these cases, we'd > not only get the benefit of not joining but also not having to remove > the duplicate rows caused by the join. +1. > > Given how simple the code is to support this, it seems to me to be > worth handling. > I find this patch very simple and still useful. @@ -597,15 +615,25 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel) + if (root->parse->distinctClause != NIL) + return true; + + if (root->parse->groupClause != NIL && !root->parse->hasAggs) + return true; + The other callers of rel_supports_distinctness() are looking for distinctness of the given relation, whereas the code change here are applicable to any relation, not just the given relation. I find that confusing. Looking at the way we are calling rel_supports_distinctness() in join_is_removable() this change looks inevitable, but still if we could reduce the confusion, that will be good. Also if we could avoid duplication of comment about unique index, that will be good. DISTINCT ON allows only a subset of columns being selected to be listed in that clause. I initially thought that specifying only a subset would be a problem and we should check whether the DISTINCT applies to all columns being selected. But that's not true, since DISTINCT ON would eliminate any duplicates in the columns listed in that clause, effectively deduplicating the row being selected. So, we are good there. May be you want to add a testcase with DISTINCT ON. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company