Setting S3 output file grantees for spark output files

2015-06-04 Thread Justin Steigel
Hi all, I'm running Spark on AWS EMR and I'm having some issues getting the correct permissions on the output files using rdd.saveAsTextFile(''). In hive, I would add a line in the beginning of the script with set fs.s3.canned.acl=BucketOwnerFullControl and that would set the correct grantees f

Re: Setting S3 output file grantees for spark output files

2015-06-05 Thread Justin Steigel
the spark-defaults.conf file. > And once you run the application you can actually check on the driver UI > (runs on 4040) Environment tab to see if the configuration is set properly. > > Thanks > Best Regards > > On Thu, Jun 4, 2015 at 8:40 PM, Justin Steigel > wrote: > >&

Spark Python process

2015-06-24 Thread Justin Steigel
I have a spark job that's running on a 10 node cluster and the python process on all the nodes is pegged at 100%. I was wondering what parts of a spark script are run in the python process and which get passed to the Java processes? Is there any documentation on this? Thanks, Justin