Re: Changing number of workers for benchmarking purposes

2016-03-14 Thread Kalpit Shah
I think "SPARK_WORKER_INSTANCES" is deprecated. This should work: "export SPARK_EXECUTOR_INSTANCES=2" -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Changing-number-of-workers-for-benchmarking-purposes-tp2606p26491.html Sent from the Apache Spark User List

Re: Changing number of workers for benchmarking purposes

2016-03-14 Thread lisak
Hey, I'm using this setup in a single m4.4xlarge node in order to utilize it : https://github.com/gettyimages/docker-spark/blob/master/docker-compose.yml but setting : SPARK_WORKER_INSTANCES: 2 SPARK_WORKER_CORES: 2 still creates only one worker. One JVM process that utilizes up to

Re: Changing number of workers for benchmarking purposes

2014-04-12 Thread Kalpit Shah
In spark release 0.7.1, I added support for running multiple worker processes on a single slave machine. I built it for performance testing multiple workers on a single machine in standalone mode. Set the following in conf/spark-env.sh and bounce your cluster : export SPARK_WORKER_INSTANCES=3 Th

Re: Changing number of workers for benchmarking purposes

2014-03-13 Thread Mayur Rustagi
How about hacking your way around it. Start with max workers & keep killing them off after each run. Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi On Thu, Mar 13, 2014 at 2:00 AM, Pierre Borckmans < pierre.borckm...@realim

Re: Changing number of workers for benchmarking purposes

2014-03-13 Thread Pierre Borckmans
Thanks Patrick. I could try that. But the idea was to be able to write a fully automated benchmark, varying the dataset size, the number of workers, the memory, … without having to stop/start the cluster each time. I was thinking something like SparkConf.set(“spark.max_number_workers”, n) wou

Re: Changing number of workers for benchmarking purposes

2014-03-12 Thread DB Tsai
One related question. Is there any way to automatically determine the optimal # of workers in yarn based on the data size, and available resources without explicitly specifying it when the job is lunched? Thanks. Sincerely, DB Tsai Machine Learning Engineer Alpine Data Labs -

Re: Changing number of workers for benchmarking purposes

2014-03-12 Thread Patrick Wendell
Hey Pierre, Currently modifying the "slaves" file is the best way to do this because in general we expect that users will want to launch workers on any slave. I think you could hack something together pretty easily to allow this. For instance if you modify the line in slaves.sh from this: for