Re: Hive on Amazon EC2 with S3

Manish Thu, 30 Aug 2012 07:00:34 -0700

Hi Suman, 

I think you need to have another directory in hive as test. Copy the
data into s3://com.xxxxx/hive/test/


Thank You,
Manish.

On Fri, 2012-08-24 at 20:43 +0000, suman.adda...@sanofipasteur.com
wrote:
> Hi,
> 
> I have setup a Hadoop cluster on Amazon EC2 with my data stored on S3.
> I would like to use Hive to process the data on S3.
> 
>  
> 
> I created an external table in hive using the following:
> 
> CREATE EXTERNAL TABLE mytable1
> 
> (
> 
>   HIT_TIME_GMT            string,
> 
>   SERVICE                 string
> 
> ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
> 
> LOCATION 's3n://com.xxxxx.webanalytics/hive/';
> 
>  
> 
> I loaded a few records into the table (LOAD DATA LOCAL INPATH
> '/home/ubuntu/data/play/test' INTO TABLE mytable1;) .
> 
>  
> 
> Select * from mytable1; shows me the data in the table.
> 
>  
> 
> When I try to run the query which requires a map-reduce job to be run,
> for example, select count(*) from mytable1; I see an exception thrown.
> 
> Total MapReduce jobs = 1
> 
> Launching Job 1 out of 1
> 
> Number of reduce tasks determined at compile time: 1
> 
> In order to change the average load for a reducer (in bytes):
> 
>   set hive.exec.reducers.bytes.per.reducer=<number>
> 
> In order to limit the maximum number of reducers:
> 
>   set hive.exec.reducers.max=<number>
> 
> In order to set a constant number of reducers:
> 
>   set mapred.reduce.tasks=<number>
> 
> java.io.FileNotFoundException: File does not exist: /hive/test
> 
>         at
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:527)
> 
>         at org.apache.hadoop.mapred.lib.CombineFileInputFormat
> $OneFileInfo.<init>(CombineFileInputFormat.java:462)
> 
>         at
> org.apache.hadoop.mapred.lib.CombineFileInputFormat.getMoreSplits(CombineFileInputFormat.java:256)
> 
>         at
> org.apache.hadoop.mapred.lib.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:212)
> 
>         at org.apache.hadoop.hive.shims.Hadoop20SShims
> $CombineFileInputFormatShim.getSplits(Hadoop20SShims.java:347)
> 
>         at org.apache.hadoop.hive.shims.Hadoop20SShims
> $CombineFileInputFormatShim.getSplits(Hadoop20SShims.java:313)
> 
>         at
> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:377)
> 
>         at
> org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:1026)
> 
>         at
> org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1018)
> 
>         at org.apache.hadoop.mapred.JobClient.access
> $600(JobClient.java:174)
> 
>         at org.apache.hadoop.mapred.JobClient
> $2.run(JobClient.java:929)
> 
>         at org.apache.hadoop.mapred.JobClient
> $2.run(JobClient.java:882)
> 
>         at java.security.AccessController.doPrivileged(Native Method)
> 
>         at javax.security.auth.Subject.doAs(Subject.java:415)
> 
>         at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278)
> 
>         at
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:882)
> 
>         at
> org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:856)
> 
>         at
> org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:671)
> 
>         at
> org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:123)
> 
>         at
> org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:131)
> 
>         at
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
> 
>         at
> org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1063)
> 
>         at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:900)
> 
>         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:748)
> 
>         at
> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:209)
> 
>         at
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:286)
> 
>         at
> org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:516)
> 
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> 
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 
>         at java.lang.reflect.Method.invoke(Method.java:601)
> 
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
> 
> Job Submission failed with exception
> 'java.io.FileNotFoundException(File does not exist: /hive/test)'
> 
> FAILED: Execution Error, return code 1 from
> org.apache.hadoop.hive.ql.exec.MapRedTask
> 
>  
> 
> The file does exist and I can see it on S3. Select * from table is
> returning the data in the table. I am not sure what is going wrong
> when a map-reduce job is being initiated by the hive query. Any
> pointer as to where I went wrong? Appreciate your help.
> 
>  
> 
> Thank you
> 
> Suman
> 
>

Re: Hive on Amazon EC2 with S3

Reply via email to