Try fs.s3n.aws… and also load from s3 data = load 's3n://...'
The "n" stands for native. I believe S3 also supports block device storage (s3://) which allows bigger files to be stored. I don't know how (if at all) the two types interact. David On Apr 7, 2013, at 1:11 PM, Panshul Whisper <[email protected]> wrote: > Hello > > I am trying to run a pig script which is suppoesed to read input from s3 > and write back to s3. The cluster > scenario is as follows: > * Cluster is installed on EC2 using Cloudera Manager 4.5 Automatic > Installation > * Installed version: CDH4 > * Script location on - one of the nodes of cluster > * running as : $ pig countGroups_daily.pig > > *The Pig Script*: > set fs.s3.awsAccessKeyId xxxxxxxxxxxxxxxxxx > set fs.s3.awsSecretAccessKey xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx > --load the sample input file > data = load 's3://steamdata/nysedata/NYSE_daily.txt' as > (exchange:chararray, symbol:chararray, date:chararray, open:float, > high:float, low:float, close:float, volume:int, adj_close:float); > --group data by symbols > symbolgrp = group data by symbol; > --count data in every group > symcount = foreach symbolgrp generate group,COUNT(data); > --order the counted list by count > symcountordered = order symcount by $1; > store symcountordered into 's3://steamdata/nyseoutput/daily'; > > *Error:* > > Message: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: > Input path does not exist: s3://steamdata/nysedata/NYSE_daily.txt > > Input(s): > Failed to read data from "s3://steamdata/nysedata/NYSE_daily.txt" > > Please help me, what am I doing wrong. I can assure you that the input > path/file exists on s3 and the AWS key and secret key entered are correct. > > Thanking You, > > > -- > Regards, > Ouch Whisper > 010101010101
