Hi Edward, Thanks for replying. I have been using the query
"select a,b from a,b where a.id=b.id ". According to my knowledge of Hive, it reads data of both A and B and emits <join_key,rowid/required row data> pairs as map outputs and then performs cartesian joins on reduce side for the same join_keys . Is this the cartesian join you are referring to? or Is it the cartesian product of the total table (as in sql) ? or Am I missing something? Can you please throw some light on the functionality of mapred.mode=strict ? Thanks, jS On Fri, Oct 21, 2011 at 7:29 PM, Edward Capriolo <edlinuxg...@gmail.com>wrote: > > > On Fri, Oct 21, 2011 at 9:22 AM, john smith <js1987.sm...@gmail.com>wrote: > >> Hi list, >> >> I am also facing the same problem. My reducers hang at this position and >> it takes hours to complete a single reduce task. Can any hive guru help us >> out with this issue. >> >> Thanks, >> jS >> >> 2011/10/21 bangbig <lizhongliangg...@163.com> >> >>> HI all, >>> >>> HIVE runs too slowly when it is doing such things(see the log below), >>> what's the problem? because I'm joining two large table? >>> >>> it runs pretty fast at first. when the job finishes 95%, it begins to slow >>> down. >>> >>> -------------------------------------------------- >>> >>> INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 forwarding 1044000000 >>> rows >>> 2011-10-21 16:55:57,427 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 >>> forwarding 1045000000 rows >>> 2011-10-21 16:55:57,545 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 >>> forwarding 1046000000 rows >>> 2011-10-21 16:55:57,686 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 >>> forwarding 1047000000 rows >>> 2011-10-21 16:55:57,806 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 >>> forwarding 1048000000 rows >>> 2011-10-21 16:55:57,926 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 >>> forwarding 1049000000 rows >>> 2011-10-21 16:55:58,045 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 >>> forwarding 1050000000 rows >>> 2011-10-21 16:55:58,164 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 >>> forwarding 1051000000 rows >>> 2011-10-21 16:55:58,284 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 >>> forwarding 1052000000 rows >>> 2011-10-21 16:55:58,405 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 >>> forwarding 1053000000 rows >>> 2011-10-21 16:55:58,525 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 >>> forwarding 1054000000 rows >>> 2011-10-21 16:55:58,644 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 >>> forwarding 1055000000 rows >>> 2011-10-21 16:55:58,764 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 >>> forwarding 1056000000 rows >>> 2011-10-21 16:55:58,883 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 >>> forwarding 1057000000 rows >>> 2011-10-21 16:55:59,003 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 >>> forwarding 1058000000 rows >>> 2011-10-21 16:55:59,122 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 >>> forwarding 1059000000 rows >>> 2011-10-21 16:55:59,242 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 >>> forwarding 1060000000 rows >>> 2011-10-21 16:55:59,361 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 >>> forwarding 1061000000 rows >>> 2011-10-21 16:55:59,482 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 >>> forwarding 1062000000 rows >>> 2011-10-21 16:55:59,601 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 >>> forwarding 1063000000 rows >>> >>> >>> >>> >> > It is hard to say without seeing the query, the table definition, and the > explain. Please send the query. Although I have a theory: > > This query is not good: > select a,b from a,b where a.id=b.id > It does a Cart join. > > This query is better. > select a,b from a inner join b on (a.id=b.id) > > Consider setting in your hive-site.xml > > hive.mapred.mode=strict > > It can prevent you from running dangerous queries. > >