You'll have a lot less hassle using the AWS EMR instances with Spark 1.4.1 for
now, until the spark_ec2.py scripts move to Hadoop 2.7.1, at the moment I'm
pretty sure it's only using Hadoop 2.4
The EMR setup with Spark lets you use s3:// URIs with IAM roles
Ewan
-Original Message-
From
There's no support for IAM roles in the s3n:// client code in Apache Hadoop (
HADOOP-9384 ); Amazon's modified EMR distro may have it..
The s3a filesystem adds it, —this is ready for production use in Hadoop 2.7.1+
(implicitly HDP 2.3; CDH 5.4 has cherrypicked the relevant patches.) I don't
k