Re: Spark on AWS

2016-05-02 Thread Gourav Sengupta
Hi, I agree with Steve, just start using vanilla SPARK EMR. You can try to see point #4 here for dynamic allocation of executors https://blogs.aws.amazon.com/bigdata/post/Tx6J5RM20WPG5V/Building-a-Recommendation-Engine-with-Spark-ML-on-Amazon-EMR-using-Zeppelin . Note that dynamic allocation of

Re: Spark on AWS

2016-05-01 Thread Teng Qiu
Hi, here we made several optimizations for accessing s3 from spark: https://github.com/apache/spark/compare/branch-1.6...zalando:branch-1.6-zalando such as: https://github.com/apache/spark/compare/branch-1.6...zalando:branch-1.6-zalando#diff-d579db9a8f27e0bbef37720ab14ec3f6R133 you can deploy our

Re: Spark on AWS

2016-04-29 Thread Steve Loughran
On 28 Apr 2016, at 22:59, Alexander Pivovarov mailto:apivova...@gmail.com>> wrote: Spark works well with S3 (read and write). However it's recommended to set spark.speculation true (it's expected that some tasks fail if you read large S3 folder, so speculation should help) I must disagree.

Re: Spark on AWS

2016-04-28 Thread Fatma Ozcan
Thanks for the responses. Fatma On Apr 28, 2016 3:00 PM, "Renato Perini" wrote: > I have setup a small development cluster using t2.micro machines and an > Amazon Linux AMI (CentOS 6.x). > The whole setup has been done manually, without using the provided > scripts. The whole setup is composed of

Re: Spark on AWS

2016-04-28 Thread Renato Perini
I have setup a small development cluster using t2.micro machines and an Amazon Linux AMI (CentOS 6.x). The whole setup has been done manually, without using the provided scripts. The whole setup is composed of a total of 5 instances: the first machine has an elastic IP and it is used as a bridge

Re: Spark on AWS

2016-04-28 Thread Alexander Pivovarov
Fatima, the easiest way to create Spark cluster on AWS is to create EMR cluster and select Spark application. (the latest EMR includes Spark 1.6.1) Spark works well with S3 (read and write). However it's recommended to set spark.speculation true (it's expected that some tasks fail if you read larg