Re: Extremely slow throughput with dynamic partitions using Hive 0.8.1 in Amazon Elastic Mapreduce

2013-06-17 Thread Shaun Clowes
Thanks for following up Ted, I couldn't work out why the progress tracking was being forced on for Dynamic Partition inserts so thanks for your helpful explanation. I'll raise a JIRA issue regarding the problem. Do you have any idea for an alternate approach? I could have a go at implementing a fix

Re: Extremely slow throughput with dynamic partitions using Hive 0.8.1 in Amazon Elastic Mapreduce

2013-06-17 Thread Ted Xu
Hi Shaun, Your findings are valid. Hive uses Hadoop job counters to report fatal error, so the client can kill the MapReduce job before it completes. With regard to your case, because Hive wants to kill the MapReduce job when there is too many partitions using Dynamic Partitioning, counters repor

Re: Extremely slow throughput with dynamic partitions using Hive 0.8.1 in Amazon Elastic Mapreduce

2013-06-06 Thread Shaun Clowes
Hi Ted, All, Unfortunately profiling turns out to be extremely slow, so it's not very fruitful for determining what's going on here. On the other hand I seem to have traced this problem down to the "hive.task.progress" configuration variable. When this is set to true (as it is automatically when

Re: Extremely slow throughput with dynamic partitions using Hive 0.8.1 in Amazon Elastic Mapreduce

2013-06-06 Thread Ted Xu
Hi Shaun, This is weird. I'm not sure if there is any other reasons (e.g., a very complex UDF?) caused this issue, but it would be the best if you can do a profiling, see if there is hot spot. On Thu, Jun 6, 2013 at 4:38 PM, Sh

Re: Extremely slow throughput with dynamic partitions using Hive 0.8.1 in Amazon Elastic Mapreduce

2013-06-06 Thread Shaun Clowes
Hi Ted, It's actually just one partition being created which is what makes it so weird. Thanks, Shaun On 6 June 2013 18:36, Ted Xu wrote: > Hi Shaun, > > Too many partitions in dynamic partitioning may slow down the mapreduce > job. Can you estimate how many partitions will be generated after

Re: Extremely slow throughput with dynamic partitions using Hive 0.8.1 in Amazon Elastic Mapreduce

2013-06-06 Thread Ted Xu
Hi Shaun, Too many partitions in dynamic partitioning may slow down the mapreduce job. Can you estimate how many partitions will be generated after insert? On Thu, Jun 6, 2013 at 4:24 PM, Shaun Clowes wrote: > Hi All, > > Does anyone know the performance impact the dynamic partitions should be

Extremely slow throughput with dynamic partitions using Hive 0.8.1 in Amazon Elastic Mapreduce

2013-06-06 Thread Shaun Clowes
Hi All, Does anyone know the performance impact the dynamic partitions should be expected to have? I have a table that is partitioned by a string in the form '-MM'. When I insert in to this table (from an external table that is just an S3 bucket containing gzipped logs) using dynamic partitio

Re: How to create/add Amazon Elastic Mapreduce Instances in VPC ?

2012-05-04 Thread Pedro Figueiredo
On 4 May 2012, at 14:10, Bhavesh Shah wrote: > Hello all, > I have Elastic Mapreduce instance. While executing hive job flow I needed > Subnet ID to access the VPC. > Is there any way to add/create the Amazon Elastic Mapreduce Instance in that > VPC? If you're using th

How to create/add Amazon Elastic Mapreduce Instances in VPC ?

2012-05-04 Thread Bhavesh Shah
Hello all, I have Elastic Mapreduce instance. While executing hive job flow I needed Subnet ID to access the VPC. Is there any way to add/create the Amazon Elastic Mapreduce Instance in that VPC? -- Regards, Bhavesh Shah

Related to speed of execution of Job in Amazon Elastic Mapreduce

2012-05-03 Thread Bhavesh Shah
performance is very poor on my single local machine ( It takes near about 3 hrs to execute completely). I want to reduce that time as much less as possible. For that we have decided to use Amazon Elastic Mapreduce. Currently I am using 3 m1.large instance and still I have same performance as on my local

Re: amazon elastic mapreduce

2011-12-11 Thread Aniket Mokashi
Hi, You have a couple of options to save your intermediate state- 1. If your metastore is HA, you can save your state in metastore (eg- alter table TBLPROPERTIES ("job.state", "DoneTill:121122)). 2. You can periodically save your state in EMR-local drives and upload it to s3. You can use any cust

amazon elastic mapreduce

2011-12-11 Thread Cam Bazz
Hello All, So I had a single node pseudo cluster that has been calculating me some statistics running for a year. finally it grew more than do-it-at-home task. So I have my data uploaded to s3, and I have configured everything so that I can load my tables, and load the partitions, and the data is