It is, because the file name is nowhere specified in my code. I think its a bug in the library, so that Hive tries to access the file in the default file system and not in S3 (in some case related to creating an Hadoop job for the query).
Von: Panshul Whisper [mailto:ouchwhis...@gmail.com] Gesendet: Donnerstag, 18. April 2013 16:18 An: user@hive.apache.org Betreff: Re: Hive query problem on S3 table This means.. it is still not looking in S3... On Apr 18, 2013 3:44 PM, "Tim Bittersohl" <t...@innoplexia.com> wrote: Hi, I just found out, that I don't have to change the default file system of Hadoop. The location in the create table command has just to be changed: CREATE EXTERNAL TABLE testtable(nyseVal STRING, cliVal STRING, dateVal STRING, number1Val STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\\t' LINES TERMINATED BY '\\n' STORED AS TextFile LOCATION "s3://hadoop-bucket/data/" But when I try to access the table with a command that creates a Hadoop job, I get the following error: 13/04/18 15:29:36 ERROR security.UserGroupInformation: PriviledgedActionException as:tim (auth:SIMPLE) cause:java.io.FileNotFoundException: File does not exist: /data/NYSE_daily.txt java.io.FileNotFoundException: File does not exist: /data/NYSE_daily.txt at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSy stem.java:807) at org.apache.hadoop.mapred.lib.CombineFileInputFormat$OneFileInfo.<init>(Combi neFileInputFormat.java:462) at org.apache.hadoop.mapred.lib.CombineFileInputFormat.getMoreSplits(CombineFil eInputFormat.java:256) at org.apache.hadoop.mapred.lib.CombineFileInputFormat.getSplits(CombineFileInp utFormat.java:212) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.ge tSplits(HadoopShimsSecure.java:411) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.ge tSplits(HadoopShimsSecure.java:377) at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInp utFormat.java:387) at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:1091) at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1083) at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:993) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:946) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.ja va:1408) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:946) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:920) at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:447) at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:136) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1352) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1138) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:951) at org.apache.hadoop.hive.service.HiveServer$HiveServerHandler.execute(HiveServ er.java:198) at org.apache.hadoop.hive.service.ThriftHive$Processor$execute.getResult(Thrift Hive.java:644) at org.apache.hadoop.hive.service.ThriftHive$Processor$execute.getResult(Thrift Hive.java:628) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServ er.java:206) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:11 45) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:6 15) at java.lang.Thread.run(Thread.java:722) Job Submission failed with exception 'java.io.FileNotFoundException(File does not exist: /data/NYSE_daily.txt)' 13/04/18 15:29:36 ERROR exec.Task: Job Submission failed with exception 'java.io.FileNotFoundException(File does not exist: /data/NYSE_daily.txt)' java.io.FileNotFoundException: File does not exist: /data/NYSE_daily.txt at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSy stem.java:807) at org.apache.hadoop.mapred.lib.CombineFileInputFormat$OneFileInfo.<init>(Combi neFileInputFormat.java:462) at org.apache.hadoop.mapred.lib.CombineFileInputFormat.getMoreSplits(CombineFil eInputFormat.java:256) at org.apache.hadoop.mapred.lib.CombineFileInputFormat.getSplits(CombineFileInp utFormat.java:212) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.ge tSplits(HadoopShimsSecure.java:411) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.ge tSplits(HadoopShimsSecure.java:377) at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInp utFormat.java:387) at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:1091) at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1083) at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:993) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:946) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.ja va:1408) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:946) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:920) at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:447) at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:136) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1352) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1138) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:951) at org.apache.hadoop.hive.service.HiveServer$HiveServerHandler.execute(HiveServ er.java:198) at org.apache.hadoop.hive.service.ThriftHive$Processor$execute.getResult(Thrift Hive.java:644) at org.apache.hadoop.hive.service.ThriftHive$Processor$execute.getResult(Thrift Hive.java:628) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServ er.java:206) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:11 45) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:6 15) at java.lang.Thread.run(Thread.java:722) FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MapRedTask 13/04/18 15:29:36 ERROR ql.Driver: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MapRedTask In the internet I found the hint to set the this configuration, to solve the problem: hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat But I just get a RuntimeException doing so: java.lang.RuntimeException: org.apache.hadoop.hive.ql.io.HiveInputFormat at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:333) at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:136) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1352) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1138) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:951) at org.apache.hadoop.hive.service.HiveServer$HiveServerHandler.execute(HiveServ er.java:198) at org.apache.hadoop.hive.service.ThriftHive$Processor$execute.getResult(Thrift Hive.java:644) at org.apache.hadoop.hive.service.ThriftHive$Processor$execute.getResult(Thrift Hive.java:628) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServ er.java:206) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:11 45) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:6 15) at java.lang.Thread.run(Thread.java:722) 13/04/18 15:37:14 ERROR exec.ExecDriver: Exception: org.apache.hadoop.hive.ql.io.HiveInputFormat FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MapRedTask 13/04/18 15:37:14 ERROR ql.Driver: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MapRedTask Im using the Cloudera 0.10.0-cdh4.2.0 version of the Hive libraries. Greetings Tim Bittersohl Software Engineer http://www.innoplexia.de/ci/logo/inno_logo_links%20200x80.png Innoplexia GmbH Mannheimer Str. 175 69123 Heidelberg Tel.: +49 (0) 6221 7198033 <tel:%2B49%20%280%29%206221%207198033> Mobiltel.: +49 (0) 160 99186759 <tel:%2B49%20%280%29%20160%2099186759> Fax: +49 (0) 6221 7198034 <tel:%2B49%20%280%29%206221%207198034> Web: www.innoplexia.com <http://www.innoplexia.com/> Sitz: 69123 Heidelberg, Mannheimer Str. 175 - Steuernummer 32494/62606 - USt. IdNr.: DE 272 871 728 - Geschäftsführer: Prof. Dr. Herbert Schuster
<<image001.png>>