I am using a spark cluster on Amazon (launched using
spark-1.6-prebuilt-with-hadoop-2.6 spark-ec2 script)
to run a scala driver application to read S3 object content in parallel.
I have tried “s3n://bucket” with sc.textFile as well as set up an RDD with
the S3 keys and then used
java aws sdk to
I am using spark-1.6.1-prebuilt-with-hadoop-2.6 on mac. I am using the
spark-ec2 to launch a cluster in
Amazon VPC. The setup.sh script [run first thing on master after launch]
uses pssh and tries to install it
via 'yum install -y pssh'. This step always fails on the master AMI that the
script us