Hi Sabarish We finally got S3 working. I think the real problem was that by default spark-ec2 uses an old version of hadoop (1.0.4). The we passed --copy-aws-credentials --hadoop-major-version=2 it started working
Kind regards Andy From: Sabarish Sasidharan <sabarish.sasidha...@manthan.com> Date: Sunday, February 14, 2016 at 7:05 PM To: Andrew Davidson <a...@santacruzintegration.com> Cc: "user @spark" <user@spark.apache.org> Subject: Re: newbie unable to write to S3 403 forbidden error > > Make sure you are using s3 bucket in same region. Also I would access my > bucket this way s3n://bucketname/foldername. > > You can test privileges using the s3 cmd line client. > > Also, if you are using instance profiles you don't need to specify access and > secret keys. No harm in specifying though. > > Regards > Sab > On 12-Feb-2016 2:46 am, "Andy Davidson" <a...@santacruzintegration.com> wrote: >> I am using spark 1.6.0 in a cluster created using the spark-ec2 script. I am >> using the standalone cluster manager >> >> My java streaming app is not able to write to s3. It appears to be some for >> of permission problem. >> >> Any idea what the problem might be? >> >> I tried use the IAM simulator to test the policy. Everything seems okay. Any >> idea how I can debug this problem? >> >> Thanks in advance >> >> Andy >> >> JavaSparkContext jsc = new JavaSparkContext(conf); >> >> >> // I did not include the full key in my email >> // the keys do not contain \¹ >> // these are the keys used to create the cluster. They belong to the >> IAM user andy >> jsc.hadoopConfiguration().set("fs.s3n.awsAccessKeyId", "AKIAJREX"); >> >> jsc.hadoopConfiguration().set("fs.s3n.awsSecretAccessKey", >> "uBh9v1hdUctI23uvq9qR"); >> >> >> >> >> private static void saveTweets(JavaDStream<String> jsonTweets, String >> outputURI) { >> >> jsonTweets.foreachRDD(new VoidFunction2<JavaRDD<String>, Time>() { >> >> private static final long serialVersionUID = 1L; >> >> >> >> @Override >> >> public void call(JavaRDD<String> rdd, Time time) throws Exception >> { >> >> if(!rdd.isEmpty()) { >> >> // bucket name is com.pws.twitter¹ it has a folder json' >> >> String dirPath = >> "s3n://s3-us-west-1.amazonaws.com/com.pws.twitter/ >> <http://s3-us-west-1.amazonaws.com/com.pws.twitter/> json² + "-" + >> time.milliseconds(); >> >> rdd.saveAsTextFile(dirPath); >> >> } >> >> } >> >> }); >> >> >> >> >> Bucket name : com.pws.titter >> Bucket policy (I replaced the account id) >> >> { >> "Version": "2012-10-17", >> "Id": "Policy1455148808376", >> "Statement": [ >> { >> "Sid": "Stmt1455148797805", >> "Effect": "Allow", >> "Principal": { >> "AWS": "arn:aws:iam::123456789012:user/andy" >> }, >> "Action": "s3:*", >> "Resource": "arn:aws:s3:::com.pws.twitter/*" >> } >> ] >> } >> >>