Jesus Camacho Rodriguez created HIVE-17037: ----------------------------------------------
Summary: Extend join algorithm selection to avoid unnecessary input data shuffle Key: HIVE-17037 URL: https://issues.apache.org/jira/browse/HIVE-17037 Project: Hive Issue Type: Improvement Components: Physical Optimizer Affects Versions: 3.0.0 Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez As an example, consider the following query: {code:sql} SELECT * FROM ( SELECT a.value FROM src1 a JOIN src1 b ON (a.value = b.value) GROUP BY a.value ) a JOIN src ON (a.value = src.value); {code} Currently, the plan generated for Tez will contain an unnecessary shuffle operation between the subquery and the join, since the records produced by the subquery are already sorted by the value. This issue is to extend join algorithm selection to be able to shuffle only some of the inputs for a given join and avoid unnecessary shuffle operations. -- This message was sent by Atlassian JIRA (v6.4.14#64029)