good point... i should have used ON... with ON it runs fine as a map-join,
and if i set hive.auto.convert.join=false then it runs with my specified
number of reducers.

with right number of reducers

On Thu, Jan 12, 2012 at 6:12 PM, Edward Capriolo <edlinuxg...@gmail.com>wrote:

> You should do joins using the ON clause.
> https://cwiki.apache.org/Hive/languagemanual-joins.html
> be careful if you do the joins wrong hive does a Cartesian product
> followed by a really long reduce phase rather then the optimal join process.
>
> On Thu, Jan 12, 2012 at 6:04 PM, Aaron McCurry <amccu...@gmail.com> wrote:
>
>> I see that your query is kinda generic and probably not the original
>> query.  I have seen this behavior with a simple typo like:
>>
>> Notice col3.
>>
>> create table z as select x.* from table1 x join table2 y where (
>> x.col1 = y.col1 and
>> x.col2 = y.col2 and
>> y.col3 = y.col3 and
>> x.col4 = y.col4 and
>> x.col5 = y.col5
>> );
>>
>> Just a thought.
>>
>> Aaron
>>
>> On Thu, Jan 12, 2012 at 6:00 PM, Wojciech Langiewicz <
>> wlangiew...@gmail.com> wrote:
>>
>>> Hello,
>>> Have you tried running only select, without creating table? What are
>>> results?
>>> How did you tried to set number of reducers? Have you used this:
>>> set mapred.reduce.tasks = xyz;
>>> How many mappers does this query use?
>>>
>>>
>>> On 12.01.2012 23:53, Koert Kuipers wrote:
>>>
>>>> I am running a basic join of 2 tables and it will only run with 1
>>>> reducer.
>>>> why is that? i tried to set the number of reducers and it didn't work.
>>>> hive
>>>> just ignored it.
>>>>
>>>> create table z as select x.* from table1 x join table2 y where (
>>>> x.col1 = y.col1 and
>>>> x.col2 = y.col2 and
>>>> x.col3 = y.col3 and
>>>> x.col4 = y.col4 and
>>>> x.col5 = y.col5
>>>> );
>>>>
>>>> both tables are backed by multiple files / blocks / chunks
>>>>
>>>>
>>> --
>>> Wojciech Langiewicz
>>>
>>
>>
>

Reply via email to