Hi all,
I'm running Spark on AWS EMR and I'm having some issues getting the correct
permissions on the output files using
rdd.saveAsTextFile('<file_dir_name>'). In hive, I would add a line in the
beginning of the script with
set fs.s3.canned.acl=BucketOwnerFullControl
and that would set the correct grantees for the files. For Spark, I tried
adding the permissions as a --conf option:
hadoop jar /mnt/var/lib/hadoop/steps/s-3HIRLHJJXV3SJ/script-runner.jar \
/home/hadoop/spark/bin/spark-submit --deploy-mode cluster --master
yarn-cluster \
--conf "spark.driver.extraJavaOptions
-Dfs.s3.canned.acl=BucketOwnerFullControl" \
hdfs:///user/hadoop/spark.py
But the permissions do not get set properly on the output files. What is
the proper way to pass in the 'fs.s3.canned.acl=BucketOwnerFullControl' or
any of the S3 canned permissions to the spark job?
Thanks in advance