Hi Marc, You can try running the hive client with debug mode on and see what is trying to do on the JT level. hive -hiveconf hive.root.logger=ALL,console -e " DDL;" hive -hiveconf hive.root.logger=ALL,console -f ddl.sql ;
Hope this helps . Thanks -Abdelrahman On Wed, Jan 30, 2013 at 3:16 PM, Marc Limotte <mslimo...@gmail.com> wrote: > Hi, > > I'm running in Amazon on an EMR cluster with hive 0.8.1. We have a lot of > other Hadoop jobs, but only started experimenting with Hive recently. > > I've been seeing a long pause after submitting a hive query and the > actually start of the hadoop job... 10 minutes or more in some cases. I'm > wondering what's happening during this time. Either a high level answer, > or maybe there is some logging I can turn on? > > Here's some more detail. I submit the query on the master using the hive > cli, and start to see some output right away... > > Total MapReduce jobs = 2 > Launching Job 1 out of 2 > Number of reduce tasks not specified. Estimated from input data size: 1 > In order to change the average load for a reducer (in bytes): > set hive.exec.reducers.bytes.per.reducer=<number> > In order to limit the maximum number of reducers: > set hive.exec.reducers.max=<number> > In order to set a constant number of reducers: > set mapred.reduce.tasks=<number> > > > *[then a long delay here: 10 minutes or more... no activity in the hadoop > job tracker ui] * > > > … and then it continues normally ... > Starting Job = job_201301160029_0082, Tracking URL = > http://ip-xxxxxxxx.ec2.internal:9100/jobdetails.jsp?jobid=job_201301160029_0082 > Kill Command = /home/hadoop/bin/hadoop job > -Dmapred.job.tracker=xxxxxx:9001 -kill job_201301160029_0082 > Hadoop job information for Stage-1: number of mappers: 2; number of > reducers: 1 > 2013-01-30 20:45:30,526 Stage-1 map = 0%, reduce = 0% > … > > This query is processing in the neighborhood of 500GB of data from S3. A > couple of possibilities I thought of… perhaps someone can confirm or deny: > a) Is the data copied from S3 to HDFS during this time? > b) I have a fairly large set of libs in HIVE_AUX_JAR_PATH (around ~175 > MB)-- does it have to copy these around to the tasks at this time? > > Any insights appreciated. > > Marc > > > >