Sorry for the mistake. You are right output ordering of broadcast join can
be the order of big table in some types of join. I will prepare a PR and
let you review later. Thanks a lot!


Chrysan Wu
吴晓菊
Phone:+86 17717640807


2018-06-29 0:00 GMT+08:00 Wenchen Fan <cloud0...@gmail.com>:

> SortMergeJoin sorts its children by join key, but broadcast join does not.
> I think the output ordering of broadcast join has nothing to do with join
> key.
>
> On Thu, Jun 28, 2018 at 11:28 PM Marco Gaido <marcogaid...@gmail.com>
> wrote:
>
>> I think the outputOrdering would be the one of the big table (if any) and
>> it wouldn't matter if this involves the join keys or not. Am I wrong?
>>
>> 2018-06-28 17:01 GMT+02:00 吴晓菊 <chrysan...@gmail.com>:
>>
>>> Thanks for the reply.
>>> By looking into the SortMergeJoinExec, I think we can follow what
>>> SortMergeJoin do, for some types of join, if the children is ordered on
>>> join keys, we can output the ordered join keys as output ordering.
>>>
>>>
>>> Chrysan Wu
>>> 吴晓菊
>>> Phone:+86 17717640807
>>>
>>>
>>> 2018-06-28 22:53 GMT+08:00 Wenchen Fan <cloud0...@gmail.com>:
>>>
>>>> SortMergeJoin only reports ordering of the join keys, not the output
>>>> ordering of any child.
>>>>
>>>> It seems reasonable to me that broadcast join should respect the output
>>>> ordering of the children. Feel free to submit a PR to fix it, thanks!
>>>>
>>>> On Thu, Jun 28, 2018 at 10:07 PM 吴晓菊 <chrysan...@gmail.com> wrote:
>>>>
>>>>> Why we cannot use the output order of big table?
>>>>>
>>>>>
>>>>> Chrysan Wu
>>>>> Phone:+86 17717640807
>>>>>
>>>>>
>>>>> 2018-06-28 21:48 GMT+08:00 Marco Gaido <marcogaid...@gmail.com>:
>>>>>
>>>>>> The easy answer to this is that SortMergeJoin ensure an
>>>>>> outputOrdering, while BroadcastHashJoin doesn't, ie. after running a
>>>>>> BroadcastHashJoin you don't know which is going to be the order of the
>>>>>> output since nothing enforces it.
>>>>>>
>>>>>> Hope this helps.
>>>>>> Thanks.
>>>>>> Marco
>>>>>>
>>>>>> 2018-06-28 15:46 GMT+02:00 吴晓菊 <chrysan...@gmail.com>:
>>>>>>
>>>>>>>
>>>>>>> We see SortMergeJoinExec is implemented with 
>>>>>>> outputPartitioning&outputOrdering
>>>>>>> while BroadcastHashJoinExec is only implemented with outputPartitioning.
>>>>>>> Why is the design?
>>>>>>>
>>>>>>> Chrysan Wu
>>>>>>> Phone:+86 17717640807
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>
>>

Reply via email to