Hi,

I'm experiencing the following:  

I've a file on s3 -- s3n://my.bucket/hive/ranjan_test.  It's got fields 
(separated by \001) and records (separated by \n).

I want it to be accessible on hive, the ddl is:
CREATE EXTERNAL TABLE IF NOT EXISTS ranjan_test (
ip_address string,
num_counted int
)
STORED AS TEXTFILE
LOCATION 's3n://my.bucket/hive/ranjan_test'

I'm able to do a simple query:

hive> select * from ranjan_test limit 5;
OK
98.226.198.23   1676
74.76.148.21    1560
76.64.28.25     1529
170.37.227.10   1363
71.202.128.196  1232
Time taken: 4.172 seconds

What I can't do is any select which fires off a mapreduce:

ive> select count(*) from ranjan_test; 
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
 set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
 set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
 set mapred.reduce.tasks=<number>
java.io.FileNotFoundException: File does not exist: /hive/ranjan_test/part-00000
        at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:546)
        at 
org.apache.hadoop.mapred.lib.CombineFileInputFormat$OneFileInfo.<init>(CombineFileInputFormat.java:462)
        at 
org.apache.hadoop.mapred.lib.CombineFileInputFormat.getMoreSplits(CombineFileInputFormat.java:256)
        at 
org.apache.hadoop.mapred.lib.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:212)
        at 
org.apache.hadoop.hive.shims.Hadoop20SShims$CombineFileInputFormatShim.getSplits(Hadoop20SShims.java:347)
        at 
org.apache.hadoop.hive.shims.Hadoop20SShims$CombineFileInputFormatShim.getSplits(Hadoop20SShims.java:313)
        at 
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:377)
        at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:971)
        at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:963)
        at org.apache.hadoop.mapred.JobClient.access$500(JobClient.java:170)
        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:880)
        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:833)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
        at 
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:833)
        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:807)
        at 
org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:671)
        at 
org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:123)
        at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:130)
        at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
        at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1063)
        at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:900)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:748)
        at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:209)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:286)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:513)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
Job Submission failed with exception 'java.io.FileNotFoundException(File does 
not exist: /hive/ranjan_test/part-00000)'
FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.MapRedTask


Any help?  The AWS credentials seem good, 'cause otherwise I wouldn't get the 
initial stuff.  Should I be doing something with the other machines in the 
cluster?

Thanks in advance,

Ranjan

Reply via email to