"inner join" is simply translated to "join" they are the same thing
(HIVE-2191)
I'm guessing he means removing the join from the where part of the query
and using the "select a,b from a join b on (a.id=b.id)" syntax.
On 10/22/2011 05:05 AM, john smith wrote:
You mean select a,b from a inner join b on (a.id <http://a.id/>=b.id
<http://b.id/>) ? or Does those brackets make some difference? Because
the inner keyword is no where mentioned in the language manual
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins
Any hints?
On Fri, Oct 21, 2011 at 8:47 PM, Edward Capriolo
<edlinuxg...@gmail.com <mailto:edlinuxg...@gmail.com>> wrote:
On Fri, Oct 21, 2011 at 10:21 AM, john smith
<js1987.sm...@gmail.com <mailto:js1987.sm...@gmail.com>> wrote:
Hi Edward,
Thanks for replying. I have been using the query
"select a,b from a,b where a.id <http://a.id/>=b.id
<http://b.id/> ". According to my knowledge of Hive, it reads
data of both A and B and emits <join_key,rowid/required row
data> pairs as map outputs and then performs cartesian joins
on reduce side for the same join_keys .
Is this the cartesian join you are referring to? or Is it the
cartesian product of the total table (as in sql) ? or Am I
missing something?
Can you please throw some light on the functionality of
mapred.mode=strict ?
Thanks,
jS
On Fri, Oct 21, 2011 at 7:29 PM, Edward Capriolo
<edlinuxg...@gmail.com <mailto:edlinuxg...@gmail.com>> wrote:
On Fri, Oct 21, 2011 at 9:22 AM, john smith
<js1987.sm...@gmail.com <mailto:js1987.sm...@gmail.com>>
wrote:
Hi list,
I am also facing the same problem. My reducers hang at
this position and it takes hours to complete a single
reduce task. Can any hive guru help us out with this
issue.
Thanks,
jS
2011/10/21 bangbig <lizhongliangg...@163.com
<mailto:lizhongliangg...@163.com>>
HI all,
HIVE runs too slowly when it is doing such things(see the
log below), what's the problem? because I'm joining two large table?
it runs pretty fast at first. when the job finishes 95%, it
begins to slow down.
--------------------------------------------------
INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4
forwarding 1044000000 rows
2011-10-21 16:55:57,427 INFO
org.apache.hadoop.hive.ql.exec.JoinOperator: 4 forwarding 1045000000 rows
2011-10-21 16:55:57,545 INFO
org.apache.hadoop.hive.ql.exec.JoinOperator: 4 forwarding 1046000000 rows
2011-10-21 16:55:57,686 INFO
org.apache.hadoop.hive.ql.exec.JoinOperator: 4 forwarding 1047000000 rows
2011-10-21 16:55:57,806 INFO
org.apache.hadoop.hive.ql.exec.JoinOperator: 4 forwarding 1048000000 rows
2011-10-21 16:55:57,926 INFO
org.apache.hadoop.hive.ql.exec.JoinOperator: 4 forwarding 1049000000 rows
2011-10-21 16:55:58,045 INFO
org.apache.hadoop.hive.ql.exec.JoinOperator: 4 forwarding 1050000000 rows
2011-10-21 16:55:58,164 INFO
org.apache.hadoop.hive.ql.exec.JoinOperator: 4 forwarding 1051000000 rows
2011-10-21 16:55:58,284 INFO
org.apache.hadoop.hive.ql.exec.JoinOperator: 4 forwarding 1052000000 rows
2011-10-21 16:55:58,405 INFO
org.apache.hadoop.hive.ql.exec.JoinOperator: 4 forwarding 1053000000 rows
2011-10-21 16:55:58,525 INFO
org.apache.hadoop.hive.ql.exec.JoinOperator: 4 forwarding 1054000000 rows
2011-10-21 16:55:58,644 INFO
org.apache.hadoop.hive.ql.exec.JoinOperator: 4 forwarding 1055000000 rows
2011-10-21 16:55:58,764 INFO
org.apache.hadoop.hive.ql.exec.JoinOperator: 4 forwarding 1056000000 rows
2011-10-21 16:55:58,883 INFO
org.apache.hadoop.hive.ql.exec.JoinOperator: 4 forwarding 1057000000 rows
2011-10-21 16:55:59,003 INFO
org.apache.hadoop.hive.ql.exec.JoinOperator: 4 forwarding 1058000000 rows
2011-10-21 16:55:59,122 INFO
org.apache.hadoop.hive.ql.exec.JoinOperator: 4 forwarding 1059000000 rows
2011-10-21 16:55:59,242 INFO
org.apache.hadoop.hive.ql.exec.JoinOperator: 4 forwarding 1060000000 rows
2011-10-21 16:55:59,361 INFO
org.apache.hadoop.hive.ql.exec.JoinOperator: 4 forwarding 1061000000 rows
2011-10-21 16:55:59,482 INFO
org.apache.hadoop.hive.ql.exec.JoinOperator: 4 forwarding 1062000000 rows
2011-10-21 16:55:59,601 INFO
org.apache.hadoop.hive.ql.exec.JoinOperator: 4 forwarding 1063000000 rows
It is hard to say without seeing the query, the table
definition, and the explain. Please send the query.
Although I have a theory:
This query is not good:
select a,b from a,b where a.id <http://a.id>=b.id
<http://b.id>
It does a Cart join.
This query is better.
select a,b from a inner join b on (a.id <http://a.id>=b.id
<http://b.id>)
Consider setting in your hive-site.xml
hive.mapred.mode=strict
It can prevent you from running dangerous queries.
To be clear:
Do NOT join this way (it results in a cartesian product):
select a,b from a,b where a.id <http://a.id>=b.id <http://b.id>
Join this way:
select a,b from a join b on (a.id <http://a.id>=b.id <http://b.id>)
Also:
set hive.mapred.mode=strict in your hive-site.xml to prevent
yourself from mistakenly doing cartesian products and other bad ideas.