Thanks Nitin for your reply. In short my Task is 1) Initially I want to import the data from MS SQL Server into HDFS using SQOOP. 2) Through Hive I am processing the data and generating the result in one table 3) That result containing table from Hive is again exported to MS SQL SERVER back.
Actually the data which I am importing from MS SQL Server is very large (near about 5,00,000 entries in one table. Like wise I have 30 tables). For this I have written a task in Hive which contains only queries (And each query has used a lot of joins in it). So due to this the performance is very poor on my single local machine ( It takes near about 3 hrs to execute completely). I have observed that when I have submitted a single query to Hive CLI it took 10-11 jobs to execute completely. * set mapred.min.split.size set mapred.max.split.size* Should this value to be set in bootstrap action while submitting jobs to amazon EMR? What value to be set for it as I don't know? -- Regards, Bhavesh Shah On Tue, May 8, 2012 at 10:31 AM, Nitin Pawar <nitinpawar...@gmail.com>wrote: > 1) check the jobtracker url to see how many maps/reducers have been > launched > 2) if you have a large dataset and wants to execute it fast, you > set mapred.min.split.size and mapred.max.split.size to an optimal value so > that more mappers will be launched and will finish > 3) if you are doing joins, there are different ways to go according to the > data you have and size of data > > it will be helpful if you can let us know your datasizes and query details > > > On Tue, May 8, 2012 at 10:07 AM, Bhavesh Shah <bhavesh25s...@gmail.com>wrote: > >> Hello all, >> I have written a Hive JDBC code and created a JAR of it. I am running >> that JAR on 10 cluster. >> But the problem as I am using the 10 cluster still the performance is >> same as that on single cluster. >> >> What to do to improve the performance of Hive Jobs? Is there anything >> configuration setting to set before the submitting Hive Jobs to cluster? >> One more thing I want to know is that How can we come to know that is job >> running on all cluster? >> >> Please let me know if anyone knows about it? >> >> -- >> Regards, >> Bhavesh Shah >> >> > > > -- > Nitin Pawar > >