If you are using hive on EMR, you can create a table directly from the
data on S3:

>From hive, you can create tables that use S3 data like this:

create external table from_to(from_address string, to_address string,
dt string) row format delimited fields terminated by '\t' stored as
textfile location 's3://rjurney_public_web/from_to_date';

You could then:
 select <*> from from_to

Balaji

On Tue, May 29, 2012 at 4:20 PM, Russell Jurney
<russell.jur...@gmail.com> wrote:
> How do I load data from S3 into Hive using Amazon EMR?  I've booted a small
> cluster, and I want to load a 3-column TSV file from Pig into a table like
> this:
>
> create table from_to (from_address string, to_address string, dt string);
>
>
> When I run something like this:
>
> load data inpath 's3n://rjurney_public_web/from_to_date' into table from_to;
>
>
> I get errors:
>
> FAILED: Error in semantic analysis: Line 1:17 Invalid path
> 's3n://rjurney_public_web/from_to_date': only "file" or "hdfs" file systems
> accepted. s3n file system is not supported.
>
>
> There is no distcp on the master node of my EMR cluster, so I can't copy it
> over.  I've read the documentation... and so far after a day of trying, I
> can't load data into HIVE via EMR.
>
> What am I missing?  Thanks!
> --
> Russell Jurney twitter.com/rjurney russell.jur...@gmail.com datasyndrome.com

Reply via email to