Hi,
I agree with Steve, just start using vanilla SPARK EMR.
You can try to see point #4 here for dynamic allocation of executors
https://blogs.aws.amazon.com/bigdata/post/Tx6J5RM20WPG5V/Building-a-Recommendation-Engine-with-Spark-ML-on-Amazon-EMR-using-Zeppelin
.
Note that dynamic allocation of
Hi, here we made several optimizations for accessing s3 from spark:
https://github.com/apache/spark/compare/branch-1.6...zalando:branch-1.6-zalando
such as:
https://github.com/apache/spark/compare/branch-1.6...zalando:branch-1.6-zalando#diff-d579db9a8f27e0bbef37720ab14ec3f6R133
you can deploy our
On 28 Apr 2016, at 22:59, Alexander Pivovarov
mailto:apivova...@gmail.com>> wrote:
Spark works well with S3 (read and write). However it's recommended to set
spark.speculation true (it's expected that some tasks fail if you read large S3
folder, so speculation should help)
I must disagree.
Thanks for the responses.
Fatma
On Apr 28, 2016 3:00 PM, "Renato Perini" wrote:
> I have setup a small development cluster using t2.micro machines and an
> Amazon Linux AMI (CentOS 6.x).
> The whole setup has been done manually, without using the provided
> scripts. The whole setup is composed of
I have setup a small development cluster using t2.micro machines and an
Amazon Linux AMI (CentOS 6.x).
The whole setup has been done manually, without using the provided
scripts. The whole setup is composed of a total of 5 instances: the
first machine has an elastic IP and it is used as a bridge
Fatima, the easiest way to create Spark cluster on AWS is to create EMR
cluster and select Spark application. (the latest EMR includes Spark 1.6.1)
Spark works well with S3 (read and write). However it's recommended to
set spark.speculation true (it's expected that some tasks fail if you read
larg