David Rowley <david.row...@2ndquadrant.com> writes: > [ remove_left_join_distinct_2017-11-30.patch ]
I looked at this, and it seems like an interesting idea, but I'm unconvinced that the specific conditions here are OK. One clear problem is that you consider a DISTINCT to be an automatic get-out-of-jail-free card, but that doesn't work in the presence of aggregates: select distinct count(*) from a left join b on ... So at the very least it needs to be more like if ((root->parse->distinctClause != NIL || root->parse->groupClause != NIL) && !root->parse->hasAggs) I believe it is probably necessary to reject this optimization if window functions are present, as well, because DISTINCT would not happen till after the wfunc calculation and wfuncs are definitely sensitive to the number of rows they see. (But maybe wfuncs + GROUP BY are OK.) Another case that seems less than safe is if the proposed-to-be-dropped left join is within the LHS of any LATERAL join. For instance, select distinct ... from (a left join b ...), lateral nextval('foo'); The row duplication from b would affect how often nextval gets called, and the presence of the DISTINCT later on doesn't entitle us to change that. (Maybe it'd be all right to perform the optimization if the lateral function is known stable/immutable, but I think that's getting too far afield; I'd be inclined to just drop the idea as soon as we see a lateral dependency.) Actually, that same case occurs for volatile functions in the tlist, ie select distinct nextval('foo') from a left join b ... The presence of the DISTINCT again doesn't excuse changing how often nextval() gets called. I kinda doubt this list of counterexamples is exhaustive, either; it's just what occurred to me in five or ten minutes thought. So maybe you can make this idea work, but you need to think much harder about what the counterexamples are. As far as code structure goes, it's surely not going to be sane to have two copies of the checking logic, much less try to cram one of them into a single if. I'd suggest factoring it out into a separate function that can be called from both places. I'll set the patch back to Waiting on Author. regards, tom lane