Hi all,
I'm running Spark on AWS EMR and I'm having some issues getting the correct
permissions on the output files using
rdd.saveAsTextFile(''). In hive, I would add a line in the
beginning of the script with
set fs.s3.canned.acl=BucketOwnerFullControl
and that would set the correct grantees f
the spark-defaults.conf file.
> And once you run the application you can actually check on the driver UI
> (runs on 4040) Environment tab to see if the configuration is set properly.
>
> Thanks
> Best Regards
>
> On Thu, Jun 4, 2015 at 8:40 PM, Justin Steigel
> wrote:
>
>&
I have a spark job that's running on a 10 node cluster and the python
process on all the nodes is pegged at 100%.
I was wondering what parts of a spark script are run in the python process
and which get passed to the Java processes? Is there any documentation on
this?
Thanks,
Justin