Hi all,

Would anybody have any comments or feedback about the hive local mode 
execution? It is advertised as providing a boost to performance for small data 
sets. It seem to fit nicely when running unit/integration tests on single node 
or virtual machine.

My exact questions are the following :

- How significantly diverge the local mode execution of queries compared to 
distributed mode? Do the results may be different in some way?

- I have had encountered error when running complex queries (with several 
joins/distinct/groupbys) that seem to relate to configuration (see below). I 
got no exact answers from the ML and I am kind of ready to dive into the source 
code.

Any idea where I should aim in order to solve that particular problem?

Thanks in advance,

Guillaume

________________________________
From: Guillaume Allain
Sent: 18 June 2013 12:14
To: user@hive.apache.org
Subject: FileNotFoundException when using hive local mode execution style

Hi all,

I plan to use  hive local in order to speed-up unit testing on (very) small 
data sets. (Data is still on hdfs). I switch the local mode by setting the 
following variables :

SET hive.exec.mode.local.auto=true;
SET mapred.local.dir=/user;
SET mapred.tmp.dir=file:///tmp;
(plus creating needed directories and permissions)

Simple GROUP BY, INNER and OUTER JOIN queries work just fine (with up to 3 
jobs) with nice performance improvements.

Unfortunately I ran into a  
FileNotFoundException:/tmp/vagrant/hive_2013-06-17_16-10-05_614_7672774118904458113/-mr-10000/1/emptyFile)
 on some more complex query (4 jobs, distinct on top of several joins, see 
below logs if needed).

Any idea about that error? What other option I am missing to have a fully 
fonctional local mode?

Thanks in advance, Guillaume

$ tail -50 
/tmp/vagrant/vagrant_20130617171313_82baad8b-1961-4055-a52e-d8865b2cd4f8.lo

2013-06-17 16:10:05,669 INFO  exec.ExecDriver (ExecDriver.java:execute(320)) - 
Using org.apache.hadoop.hive.ql.io.CombineHiveInputFormat
2013-06-17 16:10:05,688 INFO  exec.ExecDriver (ExecDriver.java:execute(342)) - 
adding libjars: 
file:///opt/events-warehouse/build/jars/joda-time.jar,file:///opt/events-warehouse/build/jars/we7-hive-udfs.jar,file:///usr/lib/hive/lib/hive-json-serde-0.2.jar,file:///usr/lib/hive/lib/hive-builtins-0.9.0-cdh4.1.2.jar,file:///opt/events-warehouse/build/jars/guava.jar
2013-06-17 16:10:05,688 INFO  exec.ExecDriver 
(ExecDriver.java:addInputPaths(840)) - Processing alias dc
2013-06-17 16:10:05,688 INFO  exec.ExecDriver 
(ExecDriver.java:addInputPaths(858)) - Adding input file 
hdfs://localhost/user/hive/warehouse/events_super_mart_test.db/dim_cohorts
2013-06-17 16:10:05,689 INFO  exec.Utilities (Utilities.java:isEmptyPath(1807)) 
- Content Summary not cached for 
hdfs://localhost/user/hive/warehouse/events_super_mart_test.db/dim_cohorts
2013-06-17 16:10:06,185 INFO  exec.ExecDriver 
(ExecDriver.java:addInputPath(789)) - Changed input file to 
file:/tmp/vagrant/hive_2013-06-17_16-10-05_614_7672774118904458113/-mr-10000/1
2013-06-17 16:10:06,226 INFO  exec.ExecDriver 
(ExecDriver.java:addInputPaths(840)) - Processing alias $INTNAME
2013-06-17 16:10:06,226 INFO  exec.ExecDriver 
(ExecDriver.java:addInputPaths(858)) - Adding input file 
hdfs://localhost/tmp/hive-vagrant/hive_2013-06-17_16-09-42_560_4077294489999242367/-mr-10004
2013-06-17 16:10:06,226 INFO  exec.Utilities (Utilities.java:isEmptyPath(1807)) 
- Content Summary not cached for 
hdfs://localhost/tmp/hive-vagrant/hive_2013-06-17_16-09-42_560_4077294489999242367/-mr-10004
2013-06-17 16:10:06,681 WARN  conf.Configuration 
(Configuration.java:warnOnceIfDeprecated(808)) - session.id<http://session.id> 
is deprecated. Instead, use dfs.metrics.session-id
2013-06-17 16:10:06,682 INFO  jvm.JvmMetrics (JvmMetrics.java:init(76)) - 
Initializing JVM Metrics with processName=JobTracker, sessionId=
2013-06-17 16:10:06,688 INFO  exec.ExecDriver 
(ExecDriver.java:createTmpDirs(215)) - Making Temp Directory: 
hdfs://localhost/tmp/hive-vagrant/hive_2013-06-17_16-09-42_560_4077294489999242367/-mr-10002
2013-06-17 16:10:06,706 WARN  mapred.JobClient 
(JobClient.java:copyAndConfigureFiles(704)) - Use GenericOptionsParser for 
parsing the arguments. Applications should implement Tool for the same.
2013-06-17 16:10:06,942 INFO  io.CombineHiveInputFormat 
(CombineHiveInputFormat.java:getSplits(370)) - CombineHiveInputSplit creating 
pool for 
file:/tmp/vagrant/hive_2013-06-17_16-10-05_614_7672774118904458113/-mr-10000/1; 
using filter path 
file:/tmp/vagrant/hive_2013-06-17_16-10-05_614_7672774118904458113/-mr-10000/1
2013-06-17 16:10:06,943 INFO  io.CombineHiveInputFormat 
(CombineHiveInputFormat.java:getSplits(370)) - CombineHiveInputSplit creating 
pool for 
hdfs://localhost/tmp/hive-vagrant/hive_2013-06-17_16-09-42_560_4077294489999242367/-mr-10004;
 using filter path 
hdfs://localhost/tmp/hive-vagrant/hive_2013-06-17_16-09-42_560_4077294489999242367/-mr-10004
2013-06-17 16:10:06,951 INFO  mapred.FileInputFormat 
(FileInputFormat.java:listStatus(196)) - Total input paths to process : 2
2013-06-17 16:10:06,953 INFO  mapred.JobClient (JobClient.java:run(982)) - 
Cleaning up the staging area 
file:/user/vagrant2000733611/.staging/job_local_0001
2013-06-17 16:10:06,953 ERROR security.UserGroupInformation 
(UserGroupInformation.java:doAs(1335)) - PriviledgedActionException as:vagrant 
(auth:SIMPLE) cause:java.io.FileNotFoundException: File does not exist: 
/tmp/vagrant/hive_2013-06-17_16-10-05_614_7672774118904458113/-mr-10000/1/emptyFile
2013-06-17 16:10:06,956 ERROR exec.ExecDriver 
(SessionState.java:printError(403)) - Job Submission failed with exception 
'java.io.FileNotFoundException(File does not exist: 
/tmp/vagrant/hive_2013-06-17_16-10-05_614_7672774118904458113/-mr-10000/1/emptyFile)'
java.io.FileNotFoundException: File does not exist: 
/tmp/vagrant/hive_2013-06-17_16-10-05_614_7672774118904458113/-mr-10000/1/emptyFile
    at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:787)
    at 
org.apache.hadoop.mapred.lib.CombineFileInputFormat$OneFileInfo.<init>(CombineFileInputFormat.java:462)
    at 
org.apache.hadoop.mapred.lib.CombineFileInputFormat.getMoreSplits(CombineFileInputFormat.java:256)
    at 
org.apache.hadoop.mapred.lib.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:212)
    at 
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:392)
    at 
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:358)
    at 
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:387)
    at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:1041)
    at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1033)
    at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:172)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:943)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:896)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
    at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:896)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:870)
    at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:435)
    at org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:677)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:208)


Installation detail:

vagrant@hadoop:/opt/events-warehouse$ hadoop version
Hadoop 2.0.0-cdh4.1.2

vagrant@hadoop:/opt/events-warehouse$ ls /usr/lib/hive/lib/ | grep hive
hive-builtins-0.9.0-cdh4.1.2.jar
hive-cli-0.9.0-cdh4.1.2.jar
hive-common-0.9.0-cdh4.1.2.jar
hive-contrib-0.9.0-cdh4.1.2.jar
hive_contrib.jar
hive-exec-0.9.0-cdh4.1.2.jar
hive-hbase-handler-0.9.0-cdh4.1.2.jar
hive-hwi-0.9.0-cdh4.1.2.jar
hive-jdbc-0.9.0-cdh4.1.2.jar
hive-json-serde-0.2.jar
hive-metastore-0.9.0-cdh4.1.2.jar
hive-pdk-0.9.0-cdh4.1.2.jar
hive-serde-0.9.0-cdh4.1.2.jar
hive-service-0.9.0-cdh4.1.2.jar
hive-shims-0.9.0-cdh4.1.2.jar



Guillaume Allain
Senior Development Engineer
t: +44 20 7117 0809
m:
blinkbox music - the easiest way to listen to the music you love, for free
www.blinkboxmusic.com

Reply via email to