Explode + joins ---------------------------------------------------------------------------------------------------- -- bash ----------------------------------------------------------------------------------------------------
mkdir t1 mkdir t2 cat>t1/data.txt A B1 B2 B4 B5 B6 cat>t2/data.txt B1 D1 B2 D2 B3 D3 B4 D4 B5 D5 B6 D6 hdfs dfs -put t1 t2 /tmp ---------------------------------------------------------------------------------------------------- -- hive ---------------------------------------------------------------------------------------------------- create external table t1 ( Column1 string ,Column2 string ,Column3 string ,Column4 string ,Column5 string ,Column6 string ,Column7 string ) row format delimited fields terminated by '\t' location '/tmp/t1' ; create external table t2 ( Column1 string ,Column2 string ) row format delimited fields terminated by '\t' location '/tmp/t2' ; Theoretically I would have written the query like this - select t1.Column1 ,t1_unpivot.val ,t2.Column2 from t1 lateral view explode (array(Column2,Column3,Column4,Column5,Column6,Column7)) t1_unpivot as val join t2 on t2.Column1 = t1_unpivot.val ; Unfortunately, this syntax is not supported FAILED: SemanticException [Error 10085]: Line 7:32 JOIN with a LATERAL VIEW is not supported 'val' As a work-around I'm nesting the "lateral view' select t1.Column1 ,t1.val ,t2.Column2 from (select t1.Column1 ,t1_unpivot.val from t1 lateral view explode (array(Column2,Column3,Column4,Column5,Column6,Column7)) t1_unpivot as val ) as t1 join t2 on t2.Column1 = t1.val ; A B1 D1 A B2 D2 A B4 D4 A B5 D5 A B6 D6 From: Lunagariya, Dhaval [mailto:dhaval.lunagar...@citi.com] Sent: Wednesday, June 08, 2016 6:25 PM To: 'user@hive.apache.org' <user@hive.apache.org> Cc: 'er.dcpa...@gmail.com' <er.dcpa...@gmail.com> Subject: RE: Need Your Inputs For Below Scenario Here Table2 is very large table and contains lakhs of rows. From: Lunagariya, Dhaval [CCC-OT] Sent: Wednesday, June 08, 2016 5:52 PM To: user@hive.apache.org<mailto:user@hive.apache.org> Subject: Need Your Inputs For Below Scenario Hey folks, Need your help. Input Table1: Column1 Column2 Column3 Column4 Column5 Column6 Column7 A B1 B2 B3(NULL) B4 B5 B6 Input Table2: Column1 Column2 B1 D1 B2 D2 B3 D3 B4 D4 B5 D5 B6 D6 Output: Column1 Column2 Column3 A B1 D1 A B2 D2 A B4 D4 A B5 D5 A B6 D6 Here B3 is skipped because B3 is NULL. What is the efficient way to get above result using Hive? Regards, Dhaval