[ https://issues.apache.org/jira/browse/HIVE-9753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14933207#comment-14933207 ]
Feng Yuan commented on HIVE-9753: --------------------------------- Gopal V/Sergey Shelukhin Can someone plz review this patch? > Wrong results when using multiple levels of Joins. When table alias of one of > the table is null with left outer joins. > ------------------------------------------------------------------------------------------------------------------------ > > Key: HIVE-9753 > URL: https://issues.apache.org/jira/browse/HIVE-9753 > Project: Hive > Issue Type: Bug > Affects Versions: 0.14.0, 1.0.0 > Reporter: Pavan Srinivas > Priority: Critical > Attachments: HIVE-9753.0-0.14.0.patch, HIVE-9753.0-1.0.0.patch, > HIVE-9753.patch, table1.data, table2.data, table3.data > > > Let take scenario, where the tables are: > {code} > drop table table1; > CREATE TABLE table1( > col1 string, > col2 string, > col3 string, > col4 string > ) > ROW FORMAT DELIMITED > FIELDS TERMINATED BY '\t' > STORED AS INPUTFORMAT > 'org.apache.hadoop.mapred.TextInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'; > drop table table2; > CREATE TABLE table2( > col1 string, > col2 bigint, > col3 string, > col4 string > ) > ROW FORMAT DELIMITED > FIELDS TERMINATED BY '\t' > STORED AS INPUTFORMAT > 'org.apache.hadoop.mapred.TextInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'; > drop table table3; > CREATE TABLE table3( > col1 string, > col2 int, > col3 int, > col4 string) > ROW FORMAT DELIMITED > FIELDS TERMINATED BY '\t' > STORED AS INPUTFORMAT > 'org.apache.hadoop.mapred.TextInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'; > {code} > Query with wrong results: > {code} > SELECT t1.col1 AS dummy, > t1.expected_column AS expected_column, > t2.col4 > FROM ( > SELECT col1, > '23-11111', > '23-13' as three, > col4 AS expected_column > FROM table1 > ) t1 > JOIN table2 t2 > ON cast(t2.col1 as string) = cast(t1.col1 as string) > LEFT OUTER JOIN > (SELECT col4, col1 > FROM table3 > ) t3 > ON t2.col4 = t3.col1 > ; > {code} > and explain output: > {code} > STAGE DEPENDENCIES: > Stage-7 is a root stage > Stage-5 depends on stages: Stage-7 > Stage-0 depends on stages: Stage-5 > STAGE PLANS: > Stage: Stage-7 > Map Reduce Local Work > Alias -> Map Local Tables: > t1:table1 > Fetch Operator > limit: -1 > t3:table3 > Fetch Operator > limit: -1 > Alias -> Map Local Operator Tree: > t1:table1 > TableScan > alias: table1 > Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column > stats: NONE > Filter Operator > predicate: col1 is not null (type: boolean) > Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column > stats: NONE > Select Operator > expressions: col1 (type: string) > outputColumnNames: _col0 > Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column > stats: NONE > HashTable Sink Operator > condition expressions: > 0 > 1 {col4} > keys: > 0 _col0 (type: string) > 1 col1 (type: string) > t3:table3 > TableScan > alias: table3 > Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column > stats: NONE > Select Operator > expressions: col1 (type: string) > outputColumnNames: _col1 > Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column > stats: NONE > HashTable Sink Operator > condition expressions: > 0 {_col0} {_col7} {_col7} > 1 > keys: > 0 _col7 (type: string) > 1 _col1 (type: string) > Stage: Stage-5 > Map Reduce > Map Operator Tree: > TableScan > alias: t2 > Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column > stats: NONE > Filter Operator > predicate: col1 is not null (type: boolean) > Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column > stats: NONE > Map Join Operator > condition map: > Inner Join 0 to 1 > condition expressions: > 0 {_col0} > 1 {col4} > keys: > 0 _col0 (type: string) > 1 col1 (type: string) > outputColumnNames: _col0, _col7 > Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column > stats: NONE > Map Join Operator > condition map: > Left Outer Join0 to 1 > condition expressions: > 0 {_col0} {_col7} {_col7} > 1 > keys: > 0 _col7 (type: string) > 1 _col1 (type: string) > outputColumnNames: _col0, _col3, _col7 > Statistics: Num rows: 0 Data size: 0 Basic stats: NONE > Column stats: NONE > Select Operator > expressions: _col0 (type: string), _col3 (type: string), > _col7 (type: string) > outputColumnNames: _col0, _col1, _col2 > Statistics: Num rows: 0 Data size: 0 Basic stats: NONE > Column stats: NONE > File Output Operator > compressed: false > Statistics: Num rows: 0 Data size: 0 Basic stats: NONE > Column stats: NONE > table: > input format: > org.apache.hadoop.mapred.TextInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > serde: > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > Local Work: > Map Reduce Local Work > Stage: Stage-0 > Fetch Operator > limit: -1 > Processor Tree: > ListSink > {code} > After the patch, explain output(correct) > {code} > STAGE DEPENDENCIES: > Stage-7 is a root stage > Stage-5 depends on stages: Stage-7 > Stage-0 depends on stages: Stage-5 > STAGE PLANS: > Stage: Stage-7 > Map Reduce Local Work > Alias -> Map Local Tables: > t1:table1 > Fetch Operator > limit: -1 > t3:table3 > Fetch Operator > limit: -1 > Alias -> Map Local Operator Tree: > t1:table1 > TableScan > alias: table1 > Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column > stats: NONE > Filter Operator > predicate: col1 is not null (type: boolean) > Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column > stats: NONE > Select Operator > expressions: col1 (type: string), col4 (type: string) > outputColumnNames: _col0, _col3 > Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column > stats: NONE > HashTable Sink Operator > condition expressions: > 0 {_col3} > 1 {col4} > keys: > 0 _col0 (type: string) > 1 col1 (type: string) > t3:table3 > TableScan > alias: table3 > Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column > stats: NONE > Select Operator > expressions: col1 (type: string) > outputColumnNames: _col1 > Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column > stats: NONE > HashTable Sink Operator > condition expressions: > 0 {_col0} {_col3} {_col7} > 1 > keys: > 0 _col7 (type: string) > 1 _col1 (type: string) > Stage: Stage-5 > Map Reduce > Map Operator Tree: > TableScan > alias: t2 > Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column > stats: NONE > Filter Operator > predicate: col1 is not null (type: boolean) > Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column > stats: NONE > Map Join Operator > condition map: > Inner Join 0 to 1 > condition expressions: > 0 {_col0} {_col3} > 1 {col4} > keys: > 0 _col0 (type: string) > 1 col1 (type: string) > outputColumnNames: _col0, _col3, _col7 > Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column > stats: NONE > Map Join Operator > condition map: > Left Outer Join0 to 1 > condition expressions: > 0 {_col0} {_col3} {_col7} > 1 > keys: > 0 _col7 (type: string) > 1 _col1 (type: string) > outputColumnNames: _col0, _col3, _col7 > Statistics: Num rows: 0 Data size: 0 Basic stats: NONE > Column stats: NONE > Select Operator > expressions: _col0 (type: string), _col3 (type: string), > _col7 (type: string) > outputColumnNames: _col0, _col1, _col2 > Statistics: Num rows: 0 Data size: 0 Basic stats: NONE > Column stats: NONE > File Output Operator > compressed: false > Statistics: Num rows: 0 Data size: 0 Basic stats: NONE > Column stats: NONE > table: > input format: > org.apache.hadoop.mapred.TextInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > serde: > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > Local Work: > Map Reduce Local Work > Stage: Stage-0 > Fetch Operator > limit: -1 > Processor Tree: > ListSink > {code} > Notice the difference of "_col3" between the two explain outputs. > Output of the query:(WRONG) > {code} > one bad_data_1 bad_data_1 > one bad_data_1 bad_data_1 > one bad_data_1 bad_data_1 > one bad_data_1 bad_data_1 > two bad_data_2 bad_data_2 > two bad_data_2 bad_data_2 > two bad_data_2 bad_data_2 > two bad_data_2 bad_data_2 > one bad_data_1 bad_data_1 > one bad_data_1 bad_data_1 > one bad_data_1 bad_data_1 > one bad_data_1 bad_data_1 > two bad_data_2 bad_data_2 > two bad_data_2 bad_data_2 > two bad_data_2 bad_data_2 > two bad_data_2 bad_data_2 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)