Re: Optimal Server Design for Spark

2014-04-03 Thread Debasish Das
@Mayur...I am hitting ulimits on the cluster if I go beyond 4 core per worker and I don't think I can change the ulimit due to sudo issues etc... If I have more workers, in ALS, I can go for 20 blocks (right now I am running 10 blocks on 10 nodes with 4 cores each and now I can go upto 20 blocks o

Re: Optimal Server Design for Spark

2014-04-03 Thread Matei Zaharia
To run multiple workers with Spark’s standalone mode, set SPARK_WORKER_INSTANCES and SPARK_WORKER_CORES in conf/spark-env.sh. For example, if you have 16 cores and want 2 workers, you could add export SPARK_WORKER_INSTANCES=2 export SPARK_WORKER_CORES=8 Matei On Apr 3, 2014, at 12:38 PM, Mayur

Re: Optimal Server Design for Spark

2014-04-03 Thread Mayur Rustagi
Are your workers not utilizing all the cores? One worker will utilize multiple cores depending on resource allocation. Regards Mayur Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi On Wed, Apr 2, 2014 at 7:19 PM, Debasish Da

Re: Optimal Server Design for Spark

2014-04-02 Thread Debasish Das
Hi Matei, How can I run multiple Spark workers per node ? I am running 8 core 10 node cluster but I do have 8 more cores on each nodeSo having 2 workers per node will definitely help my usecase. Thanks. Deb On Wed, Apr 2, 2014 at 3:58 PM, Matei Zaharia wrote: > Hey Steve, > > This config

Re: Optimal Server Design for Spark

2014-04-02 Thread Mayur Rustagi
I would suggest to start with cloud hosting if you can, depending on your usecase, memory requirement may vary a lot . Regards Mayur On Apr 2, 2014 3:59 PM, "Matei Zaharia" wrote: > Hey Steve, > > This configuration sounds pretty good. The one thing I would consider is > having more disks, for tw

Re: Optimal Server Design for Spark

2014-04-02 Thread Matei Zaharia
Hey Steve, This configuration sounds pretty good. The one thing I would consider is having more disks, for two reasons — Spark uses the disks for large shuffles and out-of-core operations, and often it’s better to run HDFS or your storage system on the same nodes. But whether this is valuable w