Re: Help with a table located on s3n

Mark Grover Thu, 15 Dec 2011 20:09:57 -0800

Hi Ranjan,
A couple of ideas come to mind:

1) Do an explain (or explain extended) on the query to find out where exactly 
Hive is trying to read/write to the file it's complaining about.


2) Look at your job conf file. There is a hyperlink to it from your Job Tracker 
web page. See if there is a config option there that is pointing to the 
/hive/ranjan_test directory. If you want, you can share it here for folks to 
see if anything is out of ordinary.

BTW, are you using Amazon EMR? If so, it might be worthwhile to post on AWS 
forums.

Mark

----- Original Message -----
From: "Ranjan Bagchi" <ran...@powerreviews.com>
To: user@hive.apache.org
Sent: Thursday, December 15, 2011 8:30:42 PM
Subject: Help with a table located on s3n

Hi,

I'm experiencing the following:  

I've a file on s3 -- s3n://my.bucket/hive/ranjan_test.  It's got fields 
(separated by \001) and records (separated by \n).

I want it to be accessible on hive, the ddl is:
CREATE EXTERNAL TABLE IF NOT EXISTS ranjan_test (
ip_address string,
num_counted int
)
STORED AS TEXTFILE
LOCATION 's3n://my.bucket/hive/ranjan_test'

I'm able to do a simple query:

hive> select * from ranjan_test limit 5;
OK
98.226.198.23   1676
74.76.148.21    1560
76.64.28.25     1529
170.37.227.10   1363
71.202.128.196  1232
Time taken: 4.172 seconds

What I can't do is any select which fires off a mapreduce:

ive> select count(*) from ranjan_test; 
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
 set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
 set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
 set mapred.reduce.tasks=<number>
java.io.FileNotFoundException: File does not exist: /hive/ranjan_test/part-00000
        at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:546)
        at 
org.apache.hadoop.mapred.lib.CombineFileInputFormat$OneFileInfo.<init>(CombineFileInputFormat.java:462)
        at 
org.apache.hadoop.mapred.lib.CombineFileInputFormat.getMoreSplits(CombineFileInputFormat.java:256)
        at 
org.apache.hadoop.mapred.lib.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:212)
        at 
org.apache.hadoop.hive.shims.Hadoop20SShims$CombineFileInputFormatShim.getSplits(Hadoop20SShims.java:347)
        at 
org.apache.hadoop.hive.shims.Hadoop20SShims$CombineFileInputFormatShim.getSplits(Hadoop20SShims.java:313)
        at 
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:377)
        at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:971)
        at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:963)
        at org.apache.hadoop.mapred.JobClient.access$500(JobClient.java:170)
        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:880)
        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:833)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
        at 
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:833)
        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:807)
        at 
org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:671)
        at 
org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:123)
        at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:130)
        at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
        at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1063)
        at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:900)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:748)
        at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:209)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:286)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:513)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
Job Submission failed with exception 'java.io.FileNotFoundException(File does 
not exist: /hive/ranjan_test/part-00000)'
FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.MapRedTask


Any help?  The AWS credentials seem good, 'cause otherwise I wouldn't get the 
initial stuff.  Should I be doing something with the other machines in the 
cluster?

Thanks in advance,

Ranjan

Re: Help with a table located on s3n

Reply via email to