[jira] [Created] (HIVE-19967) SMB Join : ReduceSink should use correct keys in optraits

Deepak Jaiswal (JIRA) Thu, 21 Jun 2018 21:18:44 -0700

Deepak Jaiswal created HIVE-19967:
-------------------------------------

             Summary: SMB Join : ReduceSink should use correct keys in optraits
                 Key: HIVE-19967
                 URL: https://issues.apache.org/jira/browse/HIVE-19967
             Project: Hive
          Issue Type: Task
            Reporter: Deepak Jaiswal
            Assignee: Deepak Jaiswal



The optraits for ReduceSinkOp used to use the key columns as bucket and sort 
columns which worked fine for SMB, however, to enable prefix in Bucket Map 
Join, this logic was updated to use the bucket columns from parent operators. 
However, this may break reduce side SMB in a scenario like this,

 

Task1 (TS bucketed by col0), passes it down to RS which ignores the key columns 
and uses col0 as bucket key.

Task2 (Set of ops work such that data is sorted by a set of columns), however, 
with current logic, the bucketing column set in Task1 keeps getting pushed in 
Optraits, thus losing the real flow.

Task3(Join op) The physical optimizer looks at the parent RS ops which 
incidentally are sorted by same column as the original Task1's bucket column, 
however, in the meantime lost the meaning.

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-19967) SMB Join : ReduceSink should use correct keys in optraits

Reply via email to