Hi all, I'm running Spark on AWS EMR and I'm having some issues getting the correct permissions on the output files using rdd.saveAsTextFile('<file_dir_name>'). In hive, I would add a line in the beginning of the script with
set fs.s3.canned.acl=BucketOwnerFullControl and that would set the correct grantees for the files. For Spark, I tried adding the permissions as a --conf option: hadoop jar /mnt/var/lib/hadoop/steps/s-3HIRLHJJXV3SJ/script-runner.jar \ /home/hadoop/spark/bin/spark-submit --deploy-mode cluster --master yarn-cluster \ --conf "spark.driver.extraJavaOptions -Dfs.s3.canned.acl=BucketOwnerFullControl" \ hdfs:///user/hadoop/spark.py But the permissions do not get set properly on the output files. What is the proper way to pass in the 'fs.s3.canned.acl=BucketOwnerFullControl' or any of the S3 canned permissions to the spark job? Thanks in advance