[jira] [Updated] (HIVE-8202) Support SMB Join for Hive on Spark [Spark Branch]

Szehon Ho (JIRA) Tue, 28 Oct 2014 15:43:05 -0700

     [ 
https://issues.apache.org/jira/browse/HIVE-8202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Szehon Ho updated HIVE-8202:
----------------------------
    Description: 
SMB joins are used wherever the tables are sorted and bucketed. It's a map-side 
join. The join boils down to just merging the already sorted tables, allowing 
this operation to be faster than an ordinary map-join. However, if the tables 
are partitioned, there could be a slow down as each mapper would need to get a 
very small chunk of a partition which has a single key. Thus, in some scenarios 
it's beneficial to convert SMB join to SMB map join as well.

The task is to research and support the conversion from regular SMB join to SMB 
map join for Spark execution engine.

  was:
SMB joins are used wherever the tables are sorted and bucketed. It's a 
reduce-side join. The join boils down to just merging the already sorted 
tables, allowing this operation to be faster than an ordinary map-join. 
However, if the tables are partitioned, there could be a slow down as each 
mapper would need to get a very small chunk of a partition which has a single 
key. Thus, in some scenarios it's beneficial to convert SMB join to SMB map 
join as well.

The task is to research and support the conversion from regular SMB join to SMB 
map join for Spark execution engine.


> Support SMB Join for Hive on Spark [Spark Branch]
> -------------------------------------------------
>
>                 Key: HIVE-8202
>                 URL: https://issues.apache.org/jira/browse/HIVE-8202
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>            Reporter: Xuefu Zhang
>            Assignee: Szehon Ho
>         Attachments: HIVE-8202.1-spark.patch, HIVE-8202.2-spark.patch, 
> HIVE-8202.3-spark.patch, HIVE-8202.4-spark.patch, HIVE-8202.5-spark.patch, 
> Hive on Spark SMB Join.docx, Hive on Spark SMB Join.pdf
>
>
> SMB joins are used wherever the tables are sorted and bucketed. It's a 
> map-side join. The join boils down to just merging the already sorted tables, 
> allowing this operation to be faster than an ordinary map-join. However, if 
> the tables are partitioned, there could be a slow down as each mapper would 
> need to get a very small chunk of a partition which has a single key. Thus, 
> in some scenarios it's beneficial to convert SMB join to SMB map join as well.
> The task is to research and support the conversion from regular SMB join to 
> SMB map join for Spark execution engine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8202) Support SMB Join for Hive on Spark [Spark Branch]

Reply via email to