If you are using hive on EMR, you can create a table directly from the data on S3:
>From hive, you can create tables that use S3 data like this: create external table from_to(from_address string, to_address string, dt string) row format delimited fields terminated by '\t' stored as textfile location 's3://rjurney_public_web/from_to_date'; You could then: select <*> from from_to Balaji On Tue, May 29, 2012 at 4:20 PM, Russell Jurney <russell.jur...@gmail.com> wrote: > How do I load data from S3 into Hive using Amazon EMR? I've booted a small > cluster, and I want to load a 3-column TSV file from Pig into a table like > this: > > create table from_to (from_address string, to_address string, dt string); > > > When I run something like this: > > load data inpath 's3n://rjurney_public_web/from_to_date' into table from_to; > > > I get errors: > > FAILED: Error in semantic analysis: Line 1:17 Invalid path > 's3n://rjurney_public_web/from_to_date': only "file" or "hdfs" file systems > accepted. s3n file system is not supported. > > > There is no distcp on the master node of my EMR cluster, so I can't copy it > over. I've read the documentation... and so far after a day of trying, I > can't load data into HIVE via EMR. > > What am I missing? Thanks! > -- > Russell Jurney twitter.com/rjurney russell.jur...@gmail.com datasyndrome.com