What is your ORC file stripe size? How many ORC files are there in each of the 
tables? It could be possible that ORC compressed the file so much that the file 
size is less than the HDFS block size. Can you please report the file size of 
the two ORC files?

Another possibility is that there are many small files. In that case by default 
hive uses CombineHiveInputFormat which combines many small files into a single 
large file. Hence you will see less number of mappers. If you are expecting one 
mapper per hdfs file, then try disabling CombineHiveInputFormat by "set 
hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;". Another way 
to control the number of mappers is by adjusting the min and max split size.

Thanks
Prasanth Jayachandran

On Oct 9, 2013, at 10:03 AM, Nitin Pawar <nitinpawar...@gmail.com> wrote:

> whats the size of the table? (in GBs? )
> 
> Whats the max and min split sizes have you provied?
> 
> 
> On Wed, Oct 9, 2013 at 10:28 PM, Gourav Sengupta 
> <gourav.had...@gmail.com>wrote:
> 
>> Hi,
>> 
>> I am trying to run a join using two tables stored in ORC file format.
>> 
>> The first table has 34 million records and the second has around 300,000
>> records.
>> 
>> Setting "set hive.auto.convert.join=true" makes the entire query run via a
>> single mapper.
>> In case I am setting "set hive.auto.convert.join=false" then there are two
>> mappers first one reads the second table and then the entire large table
>> goes through the second mapper.
>> 
>> Is there something that I am doing wrong because there are three nodes in
>> the HADOOP cluster currently and I was expecting that at least 6 mappers
>> should have been used.
>> 
>> Thanks and Regards,
>> Gourav
>> 
> 
> 
> 
> -- 
> Nitin Pawar


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Reply via email to