Hello I am trying to run a pig script which is suppoesed to read input from s3 and write back to s3. The cluster scenario is as follows: * Cluster is installed on EC2 using Cloudera Manager 4.5 Automatic Installation * Installed version: CDH4 * Script location on - one of the nodes of cluster * running as : $ pig countGroups_daily.pig
*The Pig Script*: set fs.s3.awsAccessKeyId xxxxxxxxxxxxxxxxxx set fs.s3.awsSecretAccessKey xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx --load the sample input file data = load 's3://steamdata/nysedata/NYSE_daily.txt' as (exchange:chararray, symbol:chararray, date:chararray, open:float, high:float, low:float, close:float, volume:int, adj_close:float); --group data by symbols symbolgrp = group data by symbol; --count data in every group symcount = foreach symbolgrp generate group,COUNT(data); --order the counted list by count symcountordered = order symcount by $1; store symcountordered into 's3://steamdata/nyseoutput/daily'; *Error:* Message: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input path does not exist: s3://steamdata/nysedata/NYSE_daily.txt Input(s): Failed to read data from "s3://steamdata/nysedata/NYSE_daily.txt" Please help me, what am I doing wrong. I can assure you that the input path/file exists on s3 and the AWS key and secret key entered are correct. Thanking You, -- Regards, Ouch Whisper 010101010101
