Hello, I'm executing a join of two tables. -table1 sizes 130Gb -table2 sizes 1.5Gb
In HDFS table1 is just one text file and table2 it's ten files. I'd like to execute a map-join and load in memory table2 use esp; set hive.auto.convert.join=true; #set hive.auto.convert.join.noconditionaltask = true; #I tried this one to force to execute mapjoin but I think that I don't know how to use it. #set hive.auto.convert.join.noconditionaltask.size = 10000000000; # Although it's not neccesary MAPJOIN, I have tried with and without it. SELECT /*+ MAPJOIN(table2) */ DISTINCT t1.c1, t1.c2, t2.c3, t2.c4, FROM table2 t1 RIGHT JOIN table1 t2 ON (t1.c1 = t2.c3) AND (t1.c5 = t2.c5) WHERE t2.xx = 'XX' LIMIT 10; This query creates 11 maps. Ten of them takes about 15 seconds and one of them 2hours. So, I guess that one map loads 130gb to make the join. Why doesn't Hive split that file? What I'm doing bad with this query?