Setting S3 output file grantees for spark output files

Justin Steigel Thu, 04 Jun 2015 08:11:09 -0700

Hi all,

I'm running Spark on AWS EMR and I'm having some issues getting the correct
permissions on the output files using
rdd.saveAsTextFile('<file_dir_name>').  In hive, I would add a line in the
beginning of the script with


set fs.s3.canned.acl=BucketOwnerFullControl

and that would set the correct grantees for the files. For Spark, I tried
adding the permissions as a --conf option:

hadoop jar /mnt/var/lib/hadoop/steps/s-3HIRLHJJXV3SJ/script-runner.jar \
/home/hadoop/spark/bin/spark-submit --deploy-mode cluster --master
yarn-cluster \
--conf "spark.driver.extraJavaOptions
-Dfs.s3.canned.acl=BucketOwnerFullControl" \
hdfs:///user/hadoop/spark.py

But the permissions do not get set properly on the output files. What is
the proper way to pass in the 'fs.s3.canned.acl=BucketOwnerFullControl' or
any of the S3 canned permissions to the spark job?

Thanks in advance

Setting S3 output file grantees for spark output files

Reply via email to