On 17 December 2015 at 00:17, Tomas Vondra <tomas.von...@2ndquadrant.com>
wrote:


>     1) nodeHashjoin.c (and other join nodes)
>>
>>     I've noticed we have this in the ExecHashJoin() method:
>>
>>       /*
>>        * When the inner side is unique or we're performing a
>>        * semijoin, we'll consider returning the first match, but
>>        * after that we're done with this outer tuple.
>>        */
>>       if (node->js.unique_inner)
>>           node->hj_JoinState = HJ_NEED_NEW_OUTER;
>>
>>     That seems a bit awkward because the comment speaks about unique
>>     joins *OR* semijoins, but the check only references unique_inner.
>>     That of course works because we do this in ExecInitHashJoin():
>>
>>              switch (node->join.jointype)
>>              {
>>                      case JOIN_SEMI:
>>                              hjstate->js.unique_inner = true;
>>                              /* fall through */
>>
>>     Either some semijoins may not be unique - in that case the naming is
>>     misleading at best, and we either need to find a better name or
>>     simply check two conditions like this:
>>
>>       if (node->js.unique_inner || node->join.type == JOIN_SEMI)
>>          node->hj_JoinState = HJ_NEED_NEW_OUTER;
>>
>>     Or all semijoins are unique joins, and in that case the comment may
>>     need rephrasing. But more importantly, it begs the question why
>>     we're detecting this in the executor and not in the planner? Because
>>     if we detect it in executor, we can't use this bit in costing, for
>>     example.
>>
>>
>> The reason not to detect that in the planner is simply that unique_join
>> is meant to mean that the relation on the inner side of the join will at
>> most contain a single Tuple which matches each outer relation tuple,
>> based on the join condition. I've added no detection for this in semi
>> joins, and I'd rather not go blindly setting the flag to true in the
>> planner as it simply may not be true for the semi join. At the moment
>> that might not matter as we're only using the unique_join flag as an
>> optimization in the join nodes, but I'd prefer not to do this as its
>> likely we'll want to do more with this flag later, and I'd rather we
>> keep the meaning well defined. You might argue that this is also true
>> for semi joins, but if down the road somewhere we want to perform some
>> checks on the inner relation before the join takes place, and in that
>> case the Tuples of the relation might not have the same properties we
>> claim they do.
>>
>> But you're right that reusing the flag in the join nodes is not ideal,
>> and the comment is not that great either. I'd really rather go down the
>> path of either renaming the variable, or explaining this better in the
>> comment. It seems unnecessary to check both for each tuple being joined.
>> I'd imagine that might add a little overhead to joins which are not
>> unique.
>>
>
> I'd be very surprised it that had any measurable impact.
>
> How about changing the comment to:
>>
>> /*
>> * In the event that the inner side has been detected to be
>> * unique, as an optimization we can skip searching for any
>> * subsequent matching inner tuples, as none will exist.
>> * For semijoins unique_inner will always be true, although
>> * in this case we don't match another inner tuple as this
>> * is the required semi join behavior.
>> */
>>
>> Alternatively or additionally we can rename the variable in the executor
>> state, although I've not thought of a name yet that I don't find overly
>> verbose: unique_inner_or_semi_join,  match_first_tuple_only.
>>
>
> I'd go with match_first_tuple_only.


+1

unique_inner is a state that has been detected, match_first_tuple_only is
the action we take as a result.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply via email to