look at the discussion on this thread https://groups.google.com/a/cloudera.org/forum/?fromgroups=#!topic/cdh-user/gHVq9C5H6RE
On Tue, Jun 18, 2013 at 4:44 PM, Guillaume Allain <guilla...@we7.com> wrote: > Hi all, > > I plan to use hive local in order to speed-up unit testing on (very) > small data sets. (Data is still on hdfs). I switch the local mode by > setting the following variables : > > SET hive.exec.mode.local.auto=true; > SET mapred.local.dir=/user; > SET mapred.tmp.dir=file:///tmp; > (plus creating needed directories and permissions) > > Simple GROUP BY, INNER and OUTER JOIN queries work just fine (with up to 3 > jobs) with nice performance improvements. > > Unfortunately I ran into a FileNotFoundException > :/tmp/vagrant/hive_2013-06-17_16-10-05_614_7672774118904458113/-mr-10000/1/emptyFile)on > some more complex query (4 jobs, distinct on top of several joins, see > below logs if needed). > > Any idea about that error? What other option I am missing to have a fully > fonctional local mode? > > Thanks in advance, Guillaume > > > $ tail -50 > /tmp/vagrant/vagrant_20130617171313_82baad8b-1961-4055-a52e-d8865b2cd4f8.lo > > 2013-06-17 16:10:05,669 INFO exec.ExecDriver > (ExecDriver.java:execute(320)) - Using > org.apache.hadoop.hive.ql.io.CombineHiveInputFormat > 2013-06-17 16:10:05,688 INFO exec.ExecDriver > (ExecDriver.java:execute(342)) - adding libjars: > file:///opt/events-warehouse/build/jars/joda-time.jar,file:///opt/events-warehouse/build/jars/we7-hive-udfs.jar,file:///usr/lib/hive/lib/hive-json-serde-0.2.jar,file:///usr/lib/hive/lib/hive-builtins-0.9.0-cdh4.1.2.jar,file:///opt/events-warehouse/build/jars/guava.jar > 2013-06-17 16:10:05,688 INFO exec.ExecDriver > (ExecDriver.java:addInputPaths(840)) - Processing alias dc > 2013-06-17 16:10:05,688 INFO exec.ExecDriver > (ExecDriver.java:addInputPaths(858)) - Adding input file > hdfs://localhost/user/hive/warehouse/events_super_mart_test.db/dim_cohorts > 2013-06-17 16:10:05,689 INFO exec.Utilities > (Utilities.java:isEmptyPath(1807)) - Content Summary not cached for > hdfs://localhost/user/hive/warehouse/events_super_mart_test.db/dim_cohorts > 2013-06-17 16:10:06,185 INFO exec.ExecDriver > (ExecDriver.java:addInputPath(789)) - Changed input file to > file:/tmp/vagrant/hive_2013-06-17_16-10-05_614_7672774118904458113/-mr-10000/1 > 2013-06-17 16:10:06,226 INFO exec.ExecDriver > (ExecDriver.java:addInputPaths(840)) - Processing alias $INTNAME > 2013-06-17 16:10:06,226 INFO exec.ExecDriver > (ExecDriver.java:addInputPaths(858)) - Adding input file > hdfs://localhost/tmp/hive-vagrant/hive_2013-06-17_16-09-42_560_4077294489999242367/-mr-10004 > 2013-06-17 16:10:06,226 INFO exec.Utilities > (Utilities.java:isEmptyPath(1807)) - Content Summary not cached for > hdfs://localhost/tmp/hive-vagrant/hive_2013-06-17_16-09-42_560_4077294489999242367/-mr-10004 > 2013-06-17 16:10:06,681 WARN conf.Configuration > (Configuration.java:warnOnceIfDeprecated(808)) - session.id is > deprecated. Instead, use dfs.metrics.session-id > 2013-06-17 16:10:06,682 INFO jvm.JvmMetrics (JvmMetrics.java:init(76)) - > Initializing JVM Metrics with processName=JobTracker, sessionId= > 2013-06-17 16:10:06,688 INFO exec.ExecDriver > (ExecDriver.java:createTmpDirs(215)) - Making Temp Directory: > hdfs://localhost/tmp/hive-vagrant/hive_2013-06-17_16-09-42_560_4077294489999242367/-mr-10002 > 2013-06-17 16:10:06,706 WARN mapred.JobClient > (JobClient.java:copyAndConfigureFiles(704)) - Use GenericOptionsParser for > parsing the arguments. Applications should implement Tool for the same. > 2013-06-17 16:10:06,942 INFO io.CombineHiveInputFormat > (CombineHiveInputFormat.java:getSplits(370)) - CombineHiveInputSplit > creating pool for > file:/tmp/vagrant/hive_2013-06-17_16-10-05_614_7672774118904458113/-mr-10000/1; > using filter path > file:/tmp/vagrant/hive_2013-06-17_16-10-05_614_7672774118904458113/-mr-10000/1 > 2013-06-17 16:10:06,943 INFO io.CombineHiveInputFormat > (CombineHiveInputFormat.java:getSplits(370)) - CombineHiveInputSplit > creating pool for > hdfs://localhost/tmp/hive-vagrant/hive_2013-06-17_16-09-42_560_4077294489999242367/-mr-10004; > using filter path > hdfs://localhost/tmp/hive-vagrant/hive_2013-06-17_16-09-42_560_4077294489999242367/-mr-10004 > 2013-06-17 16:10:06,951 INFO mapred.FileInputFormat > (FileInputFormat.java:listStatus(196)) - Total input paths to process : 2 > 2013-06-17 16:10:06,953 INFO mapred.JobClient (JobClient.java:run(982)) - > Cleaning up the staging area > file:/user/vagrant2000733611/.staging/job_local_0001 > 2013-06-17 16:10:06,953 ERROR security.UserGroupInformation > (UserGroupInformation.java:doAs(1335)) - PriviledgedActionException > as:vagrant (auth:SIMPLE) cause:java.io.FileNotFoundException: File does not > exist: > /tmp/vagrant/hive_2013-06-17_16-10-05_614_7672774118904458113/-mr-10000/1/emptyFile > 2013-06-17 16:10:06,956 ERROR exec.ExecDriver > (SessionState.java:printError(403)) - Job Submission failed with exception > 'java.io.FileNotFoundException(File does not exist: > /tmp/vagrant/hive_2013-06-17_16-10-05_614_7672774118904458113/-mr-10000/1/emptyFile)' > java.io.FileNotFoundException: File does not exist: > /tmp/vagrant/hive_2013-06-17_16-10-05_614_7672774118904458113/-mr-10000/1/emptyFile > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:787) > at > org.apache.hadoop.mapred.lib.CombineFileInputFormat$OneFileInfo.<init>(CombineFileInputFormat.java:462) > at > org.apache.hadoop.mapred.lib.CombineFileInputFormat.getMoreSplits(CombineFileInputFormat.java:256) > at > org.apache.hadoop.mapred.lib.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:212) > at > org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:392) > at > org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:358) > at > org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:387) > at > org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:1041) > at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1033) > at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:172) > at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:943) > at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:896) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332) > at > org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:896) > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:870) > at > org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:435) > at org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:677) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:208) > > Installation detail: > > vagrant@hadoop:/opt/events-warehouse$ hadoop version > Hadoop 2.0.0-cdh4.1.2 > > vagrant@hadoop:/opt/events-warehouse$ ls /usr/lib/hive/lib/ | grep hive > hive-builtins-0.9.0-cdh4.1.2.jar > hive-cli-0.9.0-cdh4.1.2.jar > hive-common-0.9.0-cdh4.1.2.jar > hive-contrib-0.9.0-cdh4.1.2.jar > hive_contrib.jar > hive-exec-0.9.0-cdh4.1.2.jar > hive-hbase-handler-0.9.0-cdh4.1.2.jar > hive-hwi-0.9.0-cdh4.1.2.jar > hive-jdbc-0.9.0-cdh4.1.2.jar > hive-json-serde-0.2.jar > hive-metastore-0.9.0-cdh4.1.2.jar > hive-pdk-0.9.0-cdh4.1.2.jar > hive-serde-0.9.0-cdh4.1.2.jar > hive-service-0.9.0-cdh4.1.2.jar > hive-shims-0.9.0-cdh4.1.2.jar > > -- Nitin Pawar