Re: Want to improve the performance for execution of Hive Jobs.

Bhavesh Shah Mon, 07 May 2012 22:12:03 -0700

Thanks Nitin for your reply.

In short my Task is
1) Initially I want to import the data from MS SQL Server into HDFS using
SQOOP.
2) Through Hive I am processing the data and generating the result in one
table
3) That result containing table from Hive is again exported to MS SQL
SERVER back.

Actually the data which I am importing from MS SQL Server is very large
(near about 5,00,000 entries in one table. Like wise I have 30 tables). For
this I have written a task in Hive which contains only queries (And each
query has used a lot of joins in it). So due to this the performance is
very poor on  my single local machine ( It takes near about 3 hrs to
execute completely). I have observed that when I have submitted a single
query to Hive CLI it took 10-11 jobs to execute completely.

* set mapred.min.split.size
set mapred.max.split.size*
Should this value to be set in bootstrap action while submitting jobs to
amazon EMR? What value to be set for it as I don't know?

-- 
Regards,
Bhavesh Shah

On Tue, May 8, 2012 at 10:31 AM, Nitin Pawar <nitinpawar...@gmail.com>wrote:

> 1) check the jobtracker url to see how many maps/reducers have been
> launched
> 2) if you have a large dataset and wants to execute it fast, you
> set mapred.min.split.size and mapred.max.split.size to an optimal value so
> that more mappers will be launched and will finish
> 3) if you are doing joins, there are different ways to go according to the
> data you have and size of data
>
> it will be helpful if you can let us know your datasizes and query details
>
>
> On Tue, May 8, 2012 at 10:07 AM, Bhavesh Shah <bhavesh25s...@gmail.com>wrote:
>
>> Hello all,
>> I have written a Hive JDBC code and created a JAR of it. I am running
>> that JAR on 10 cluster.
>> But the problem as I am using the 10 cluster still the performance is
>> same as that on single cluster.
>>
>> What to do to improve the performance of Hive Jobs? Is there anything
>> configuration setting to set before the submitting Hive Jobs to cluster?
>> One more thing I want to know is that How can we come to know that is job
>> running on all cluster?
>>
>> Please let me know if anyone knows about it?
>>
>> --
>> Regards,
>> Bhavesh Shah
>>
>>
>
>
> --
> Nitin Pawar
>
>

Re: Want to improve the performance for execution of Hive Jobs.

Reply via email to