@Mayur...I am hitting ulimits on the cluster if I go beyond 4 core per worker and I don't think I can change the ulimit due to sudo issues etc...
If I have more workers, in ALS, I can go for 20 blocks (right now I am running 10 blocks on 10 nodes with 4 cores each and now I can go upto 20 blocks on 10 nodes with 4 cores each) and per process I can still be within ulimit... For the ALS stress case, right now with 10 blocks, seems like I have to persist RDDs to HDFS each iteration which I want to avoid if possible.. @Matei Thanks, Trying those configs out... On Thu, Apr 3, 2014 at 2:47 PM, Matei Zaharia <[email protected]>wrote: > To run multiple workers with Spark's standalone mode, set > SPARK_WORKER_INSTANCES and SPARK_WORKER_CORES in conf/spark-env.sh. For > example, if you have 16 cores and want 2 workers, you could add > > export SPARK_WORKER_INSTANCES=2 > export SPARK_WORKER_CORES=8 > > Matei > > On Apr 3, 2014, at 12:38 PM, Mayur Rustagi <[email protected]> > wrote: > > > Are your workers not utilizing all the cores? > > One worker will utilize multiple cores depending on resource allocation. > > Regards > > Mayur > > > > Mayur Rustagi > > Ph: +1 (760) 203 3257 > > http://www.sigmoidanalytics.com > > @mayur_rustagi > > > > > > > > On Wed, Apr 2, 2014 at 7:19 PM, Debasish Das <[email protected]> > wrote: > > Hi Matei, > > > > How can I run multiple Spark workers per node ? I am running 8 core 10 > node cluster but I do have 8 more cores on each node....So having 2 workers > per node will definitely help my usecase. > > > > Thanks. > > Deb > > > > > > > > > > On Wed, Apr 2, 2014 at 3:58 PM, Matei Zaharia <[email protected]> > wrote: > > Hey Steve, > > > > This configuration sounds pretty good. The one thing I would consider is > having more disks, for two reasons -- Spark uses the disks for large > shuffles and out-of-core operations, and often it's better to run HDFS or > your storage system on the same nodes. But whether this is valuable will > depend on whether you plan to do that in your deployment. You should > determine that and go from there. > > > > The amount of cores and RAM are both good -- actually with a lot more of > these you would probably want to run multiple Spark workers per node, which > is more work to configure. Your numbers are in line with other deployments. > > > > There's a provisioning overview with more details at > https://spark.apache.org/docs/latest/hardware-provisioning.html but what > you have sounds fine. > > > > Matei > > > > On Apr 2, 2014, at 2:58 PM, Stephen Watt <[email protected]> wrote: > > > > > Hi Folks > > > > > > I'm looking to buy some gear to run Spark. I'm quite well versed in > Hadoop Server design but there does not seem to be much Spark related > collateral around infrastructure guidelines (or at least I haven't been > able to find them). My current thinking for server design is something > along these lines. > > > > > > - 2 x 10Gbe NICs > > > - 128 GB RAM > > > - 6 x 1 TB Small Form Factor Disks (2 x RAID 1 Mirror for O/S and > Runtimes, 4 x 1TB for Data Drives) > > > - 1 Disk Controller > > > - 2 x 2.6 GHz 6 core processors > > > > > > If I stick with 1u servers then I lose disk capacity per rack but I > get a lot more memory and CPU capacity per rack. This increases my total > cluster memory footprint and it doesn't seem to make sense to have super > dense storage servers because I can't fit all that data on disk in memory > anyways. So at present, my thinking is to go with 1u servers instead of 2u > Servers. Is 128GB RAM per server normal? Do you guys use more or less than > that? > > > > > > Any feedback would be appreciated > > > > > > Regards > > > Steve Watt > > > > > > > >
