Re: hive runs slowly

john smith Fri, 21 Oct 2011 07:22:12 -0700

Hi Edward,

Thanks for replying. I have been using the query


"select a,b from a,b where a.id=b.id ".  According to my knowledge of Hive,
it reads data of both A and B and emits <join_key,rowid/required row data>
pairs as map outputs and then performs cartesian joins on reduce side for
the same join_keys .

Is this the cartesian join you are referring to? or Is it the cartesian
product of the total table (as in sql) ? or Am I missing something?

Can you please throw some light on the functionality of mapred.mode=strict ?

Thanks,
jS

On Fri, Oct 21, 2011 at 7:29 PM, Edward Capriolo <edlinuxg...@gmail.com>wrote:

>
>
> On Fri, Oct 21, 2011 at 9:22 AM, john smith <js1987.sm...@gmail.com>wrote:
>
>> Hi list,
>>
>> I am also facing the same problem. My reducers hang at this position and
>> it takes hours to complete a single reduce task. Can any hive guru help us
>> out with this issue.
>>
>> Thanks,
>> jS
>>
>> 2011/10/21 bangbig <lizhongliangg...@163.com>
>>
>>> HI all,
>>>
>>> HIVE runs too slowly when it is doing such things(see the log below), 
>>> what's the problem? because I'm joining two large table?
>>>
>>> it runs pretty fast at first. when the job finishes 95%, it begins to slow 
>>> down.
>>>
>>> --------------------------------------------------
>>>
>>> INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 forwarding 1044000000 
>>> rows
>>> 2011-10-21 16:55:57,427 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
>>> forwarding 1045000000 rows
>>> 2011-10-21 16:55:57,545 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
>>> forwarding 1046000000 rows
>>> 2011-10-21 16:55:57,686 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
>>> forwarding 1047000000 rows
>>> 2011-10-21 16:55:57,806 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
>>> forwarding 1048000000 rows
>>> 2011-10-21 16:55:57,926 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
>>> forwarding 1049000000 rows
>>> 2011-10-21 16:55:58,045 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
>>> forwarding 1050000000 rows
>>> 2011-10-21 16:55:58,164 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
>>> forwarding 1051000000 rows
>>> 2011-10-21 16:55:58,284 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
>>> forwarding 1052000000 rows
>>> 2011-10-21 16:55:58,405 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
>>> forwarding 1053000000 rows
>>> 2011-10-21 16:55:58,525 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
>>> forwarding 1054000000 rows
>>> 2011-10-21 16:55:58,644 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
>>> forwarding 1055000000 rows
>>> 2011-10-21 16:55:58,764 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
>>> forwarding 1056000000 rows
>>> 2011-10-21 16:55:58,883 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
>>> forwarding 1057000000 rows
>>> 2011-10-21 16:55:59,003 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
>>> forwarding 1058000000 rows
>>> 2011-10-21 16:55:59,122 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
>>> forwarding 1059000000 rows
>>> 2011-10-21 16:55:59,242 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
>>> forwarding 1060000000 rows
>>> 2011-10-21 16:55:59,361 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
>>> forwarding 1061000000 rows
>>> 2011-10-21 16:55:59,482 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
>>> forwarding 1062000000 rows
>>> 2011-10-21 16:55:59,601 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
>>> forwarding 1063000000 rows
>>>
>>>
>>>
>>>
>>
> It is hard to say without seeing the query, the table definition, and the
> explain. Please send the query. Although I have a theory:
>
> This query is not good:
> select a,b from a,b where a.id=b.id
> It does a Cart join.
>
> This query is better.
> select a,b from a inner join b on (a.id=b.id)
>
> Consider setting in your hive-site.xml
>
> hive.mapred.mode=strict
>
> It can prevent you from running dangerous queries.
>
>

Re: hive runs slowly

Reply via email to