Hi Suman, I think you need to have another directory in hive as test. Copy the data into s3://com.xxxxx/hive/test/
Thank You, Manish. On Fri, 2012-08-24 at 20:43 +0000, suman.adda...@sanofipasteur.com wrote: > Hi, > > I have setup a Hadoop cluster on Amazon EC2 with my data stored on S3. > I would like to use Hive to process the data on S3. > > > > I created an external table in hive using the following: > > CREATE EXTERNAL TABLE mytable1 > > ( > > HIT_TIME_GMT string, > > SERVICE string > > ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' > > LOCATION 's3n://com.xxxxx.webanalytics/hive/'; > > > > I loaded a few records into the table (LOAD DATA LOCAL INPATH > '/home/ubuntu/data/play/test' INTO TABLE mytable1;) . > > > > Select * from mytable1; shows me the data in the table. > > > > When I try to run the query which requires a map-reduce job to be run, > for example, select count(*) from mytable1; I see an exception thrown. > > Total MapReduce jobs = 1 > > Launching Job 1 out of 1 > > Number of reduce tasks determined at compile time: 1 > > In order to change the average load for a reducer (in bytes): > > set hive.exec.reducers.bytes.per.reducer=<number> > > In order to limit the maximum number of reducers: > > set hive.exec.reducers.max=<number> > > In order to set a constant number of reducers: > > set mapred.reduce.tasks=<number> > > java.io.FileNotFoundException: File does not exist: /hive/test > > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:527) > > at org.apache.hadoop.mapred.lib.CombineFileInputFormat > $OneFileInfo.<init>(CombineFileInputFormat.java:462) > > at > org.apache.hadoop.mapred.lib.CombineFileInputFormat.getMoreSplits(CombineFileInputFormat.java:256) > > at > org.apache.hadoop.mapred.lib.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:212) > > at org.apache.hadoop.hive.shims.Hadoop20SShims > $CombineFileInputFormatShim.getSplits(Hadoop20SShims.java:347) > > at org.apache.hadoop.hive.shims.Hadoop20SShims > $CombineFileInputFormatShim.getSplits(Hadoop20SShims.java:313) > > at > org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:377) > > at > org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:1026) > > at > org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1018) > > at org.apache.hadoop.mapred.JobClient.access > $600(JobClient.java:174) > > at org.apache.hadoop.mapred.JobClient > $2.run(JobClient.java:929) > > at org.apache.hadoop.mapred.JobClient > $2.run(JobClient.java:882) > > at java.security.AccessController.doPrivileged(Native Method) > > at javax.security.auth.Subject.doAs(Subject.java:415) > > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278) > > at > org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:882) > > at > org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:856) > > at > org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:671) > > at > org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:123) > > at > org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:131) > > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) > > at > org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1063) > > at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:900) > > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:748) > > at > org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:209) > > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:286) > > at > org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:516) > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > > at java.lang.reflect.Method.invoke(Method.java:601) > > at org.apache.hadoop.util.RunJar.main(RunJar.java:197) > > Job Submission failed with exception > 'java.io.FileNotFoundException(File does not exist: /hive/test)' > > FAILED: Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.MapRedTask > > > > The file does exist and I can see it on S3. Select * from table is > returning the data in the table. I am not sure what is going wrong > when a map-reduce job is being initiated by the hive query. Any > pointer as to where I went wrong? Appreciate your help. > > > > Thank you > > Suman > >