On 17 December 2015 at 00:17, Tomas Vondra <tomas.von...@2ndquadrant.com> wrote:
> 1) nodeHashjoin.c (and other join nodes) >> >> I've noticed we have this in the ExecHashJoin() method: >> >> /* >> * When the inner side is unique or we're performing a >> * semijoin, we'll consider returning the first match, but >> * after that we're done with this outer tuple. >> */ >> if (node->js.unique_inner) >> node->hj_JoinState = HJ_NEED_NEW_OUTER; >> >> That seems a bit awkward because the comment speaks about unique >> joins *OR* semijoins, but the check only references unique_inner. >> That of course works because we do this in ExecInitHashJoin(): >> >> switch (node->join.jointype) >> { >> case JOIN_SEMI: >> hjstate->js.unique_inner = true; >> /* fall through */ >> >> Either some semijoins may not be unique - in that case the naming is >> misleading at best, and we either need to find a better name or >> simply check two conditions like this: >> >> if (node->js.unique_inner || node->join.type == JOIN_SEMI) >> node->hj_JoinState = HJ_NEED_NEW_OUTER; >> >> Or all semijoins are unique joins, and in that case the comment may >> need rephrasing. But more importantly, it begs the question why >> we're detecting this in the executor and not in the planner? Because >> if we detect it in executor, we can't use this bit in costing, for >> example. >> >> >> The reason not to detect that in the planner is simply that unique_join >> is meant to mean that the relation on the inner side of the join will at >> most contain a single Tuple which matches each outer relation tuple, >> based on the join condition. I've added no detection for this in semi >> joins, and I'd rather not go blindly setting the flag to true in the >> planner as it simply may not be true for the semi join. At the moment >> that might not matter as we're only using the unique_join flag as an >> optimization in the join nodes, but I'd prefer not to do this as its >> likely we'll want to do more with this flag later, and I'd rather we >> keep the meaning well defined. You might argue that this is also true >> for semi joins, but if down the road somewhere we want to perform some >> checks on the inner relation before the join takes place, and in that >> case the Tuples of the relation might not have the same properties we >> claim they do. >> >> But you're right that reusing the flag in the join nodes is not ideal, >> and the comment is not that great either. I'd really rather go down the >> path of either renaming the variable, or explaining this better in the >> comment. It seems unnecessary to check both for each tuple being joined. >> I'd imagine that might add a little overhead to joins which are not >> unique. >> > > I'd be very surprised it that had any measurable impact. > > How about changing the comment to: >> >> /* >> * In the event that the inner side has been detected to be >> * unique, as an optimization we can skip searching for any >> * subsequent matching inner tuples, as none will exist. >> * For semijoins unique_inner will always be true, although >> * in this case we don't match another inner tuple as this >> * is the required semi join behavior. >> */ >> >> Alternatively or additionally we can rename the variable in the executor >> state, although I've not thought of a name yet that I don't find overly >> verbose: unique_inner_or_semi_join, match_first_tuple_only. >> > > I'd go with match_first_tuple_only. +1 unique_inner is a state that has been detected, match_first_tuple_only is the action we take as a result. -- Simon Riggs http://www.2ndQuadrant.com/ <http://www.2ndquadrant.com/> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services