AW: Hive query problem on S3 table

Tim Bittersohl Thu, 18 Apr 2013 07:44:10 -0700

It is, because the file name is nowhere specified in my code. I think its a
bug in the library, so that Hive tries to access the file in the default
file system and not in S3 (in some case related to creating an Hadoop job
for the query).


 

 

Von: Panshul Whisper [mailto:ouchwhis...@gmail.com] 
Gesendet: Donnerstag, 18. April 2013 16:18
An: user@hive.apache.org
Betreff: Re: Hive query problem on S3 table

 

This means.. it is still not looking in S3...

On Apr 18, 2013 3:44 PM, "Tim Bittersohl" <t...@innoplexia.com> wrote:

Hi,

 

I just found out, that I don't have to change the default file system of
Hadoop.

The location in the create table command has just to be changed:

 

CREATE EXTERNAL TABLE testtable(nyseVal STRING, cliVal STRING, dateVal
STRING, number1Val STRING)

ROW FORMAT DELIMITED FIELDS TERMINATED BY '\\t'

LINES TERMINATED BY '\\n'

STORED AS TextFile LOCATION "s3://hadoop-bucket/data/"

 

 

But when I try to access the table with a command that creates a Hadoop job,
I get the following error:

 

13/04/18 15:29:36 ERROR security.UserGroupInformation:
PriviledgedActionException as:tim (auth:SIMPLE)
cause:java.io.FileNotFoundException: File does not exist:
/data/NYSE_daily.txt

java.io.FileNotFoundException: File does not exist: /data/NYSE_daily.txt

                at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSy
stem.java:807)

                at
org.apache.hadoop.mapred.lib.CombineFileInputFormat$OneFileInfo.<init>(Combi
neFileInputFormat.java:462)

                at
org.apache.hadoop.mapred.lib.CombineFileInputFormat.getMoreSplits(CombineFil
eInputFormat.java:256)

                at
org.apache.hadoop.mapred.lib.CombineFileInputFormat.getSplits(CombineFileInp
utFormat.java:212)

                at
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.ge
tSplits(HadoopShimsSecure.java:411)

                at
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.ge
tSplits(HadoopShimsSecure.java:377)

                at
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInp
utFormat.java:387)

                at
org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:1091)

                at
org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1083)

                at
org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174)

                at
org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:993)

                at
org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:946)

                at java.security.AccessController.doPrivileged(Native
Method)

                at javax.security.auth.Subject.doAs(Subject.java:415)

                at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.ja
va:1408)

                at
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:946)

                at
org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:920)

                at
org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:447)

                at
org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:136)

                at
org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138)

                at
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)

                at
org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1352)

                at
org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1138)

                at org.apache.hadoop.hive.ql.Driver.run(Driver.java:951)

                at
org.apache.hadoop.hive.service.HiveServer$HiveServerHandler.execute(HiveServ
er.java:198)

                at
org.apache.hadoop.hive.service.ThriftHive$Processor$execute.getResult(Thrift
Hive.java:644)

                at
org.apache.hadoop.hive.service.ThriftHive$Processor$execute.getResult(Thrift
Hive.java:628)

                at
org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)

                at
org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)

                at
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServ
er.java:206)

                at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:11
45)

                at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:6
15)

                at java.lang.Thread.run(Thread.java:722)

Job Submission failed with exception 'java.io.FileNotFoundException(File
does not exist: /data/NYSE_daily.txt)'

13/04/18 15:29:36 ERROR exec.Task: Job Submission failed with exception
'java.io.FileNotFoundException(File does not exist: /data/NYSE_daily.txt)'

java.io.FileNotFoundException: File does not exist: /data/NYSE_daily.txt

                at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSy
stem.java:807)

                at
org.apache.hadoop.mapred.lib.CombineFileInputFormat$OneFileInfo.<init>(Combi
neFileInputFormat.java:462)

                at
org.apache.hadoop.mapred.lib.CombineFileInputFormat.getMoreSplits(CombineFil
eInputFormat.java:256)

                at
org.apache.hadoop.mapred.lib.CombineFileInputFormat.getSplits(CombineFileInp
utFormat.java:212)

                at
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.ge
tSplits(HadoopShimsSecure.java:411)

                at
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.ge
tSplits(HadoopShimsSecure.java:377)

                at
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInp
utFormat.java:387)

                at
org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:1091)

                at
org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1083)

                at
org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174)

                at
org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:993)

                at
org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:946)

                at java.security.AccessController.doPrivileged(Native
Method)

                at javax.security.auth.Subject.doAs(Subject.java:415)

                at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.ja
va:1408)

                at
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:946)

                at
org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:920)

                at
org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:447)

                at
org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:136)

                at
org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138)

                at
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)

                at
org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1352)

                at
org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1138)

                at org.apache.hadoop.hive.ql.Driver.run(Driver.java:951)

                at
org.apache.hadoop.hive.service.HiveServer$HiveServerHandler.execute(HiveServ
er.java:198)

                at
org.apache.hadoop.hive.service.ThriftHive$Processor$execute.getResult(Thrift
Hive.java:644)

                at
org.apache.hadoop.hive.service.ThriftHive$Processor$execute.getResult(Thrift
Hive.java:628)

                at
org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)

                at
org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)

                at
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServ
er.java:206)

                at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:11
45)

                at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:6
15)

                at java.lang.Thread.run(Thread.java:722)

 

FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.MapRedTask

13/04/18 15:29:36 ERROR ql.Driver: FAILED: Execution Error, return code 1
from org.apache.hadoop.hive.ql.exec.MapRedTask

 

 

In the internet I found the hint to set the this configuration, to solve the
problem:

 

hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat

 

But I just get a RuntimeException doing so:

 

java.lang.RuntimeException: org.apache.hadoop.hive.ql.io.HiveInputFormat  

                at
org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:333)

                at
org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:136)

                at
org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138)

                at
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)

                at
org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1352)

                at
org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1138)

                at org.apache.hadoop.hive.ql.Driver.run(Driver.java:951)

                at
org.apache.hadoop.hive.service.HiveServer$HiveServerHandler.execute(HiveServ
er.java:198)

                at
org.apache.hadoop.hive.service.ThriftHive$Processor$execute.getResult(Thrift
Hive.java:644)

                at
org.apache.hadoop.hive.service.ThriftHive$Processor$execute.getResult(Thrift
Hive.java:628)

                at
org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)

                at
org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)

                at
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServ
er.java:206)

                at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:11
45)

                at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:6
15)

                at java.lang.Thread.run(Thread.java:722)

13/04/18 15:37:14 ERROR exec.ExecDriver: Exception:
org.apache.hadoop.hive.ql.io.HiveInputFormat               

FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.MapRedTask

13/04/18 15:37:14 ERROR ql.Driver: FAILED: Execution Error, return code 1
from org.apache.hadoop.hive.ql.exec.MapRedTask

 

 

Im using the Cloudera 0.10.0-cdh4.2.0 version of the Hive libraries.

 

Greetings

Tim Bittersohl 

Software Engineer 


http://www.innoplexia.de/ci/logo/inno_logo_links%20200x80.png

Innoplexia GmbH
Mannheimer Str. 175 

69123 Heidelberg 

Tel.: +49 (0) 6221 7198033 <tel:%2B49%20%280%29%206221%207198033> 
Mobiltel.: +49 (0) 160 99186759 <tel:%2B49%20%280%29%20160%2099186759> 
Fax: +49 (0) 6221 7198034 <tel:%2B49%20%280%29%206221%207198034>  
Web: www.innoplexia.com <http://www.innoplexia.com/>  

Sitz: 69123 Heidelberg, Mannheimer Str. 175 - Steuernummer 32494/62606 -
USt. IdNr.: DE 272 871 728 - Geschäftsführer: Prof. Dr. Herbert Schuster

<<image001.png>>

AW: Hive query problem on S3 table

Reply via email to