In Standalone mode, a Worker JVM starts an Executor. Inside the Exec there are slots for task threads. The slot count is configured by the num_cores setting. Generally over subscribe this. So if you have 10 free CPU cores, set num_cores to 20.
On Monday, February 23, 2015, Deep Pradhan <[email protected]> wrote: > How is task slot different from # of Workers? > > > >> so don't read into any performance metrics you've collected to > extrapolate what may happen at scale. > I did not get you in this. > > Thank You > > On Mon, Feb 23, 2015 at 10:52 PM, Sameer Farooqui <[email protected] > <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: > >> In general you should first figure out how many task slots are in the >> cluster and then repartition the RDD to maybe 2x that #. So if you have a >> 100 slots, then maybe RDDs with partition count of 100-300 would be normal. >> >> But also size of each partition can matter. You want a task to operate on >> a partition for at least 200ms, but no longer than around 20 seconds. >> >> Even if you have 100 slots, it could be okay to have a RDD with 10,000 >> partitions if you've read in a large file. >> >> So don't repartition your RDD to match the # of Worker JVMs, but rather >> align it to the total # of task slots in the Executors. >> >> If you're running on a single node, shuffle operations become almost free >> (because there's no network movement), so don't read into any >> performance metrics you've collected to extrapolate what may happen at >> scale. >> >> >> On Monday, February 23, 2015, Deep Pradhan <[email protected] >> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: >> >>> Hi, >>> If I repartition my data by a factor equal to the number of worker >>> instances, will the performance be better or worse? >>> As far as I understand, the performance should be better, but in my case >>> it is becoming worse. >>> I have a single node standalone cluster, is it because of this? >>> Am I guaranteed to have a better performance if I do the same thing in a >>> multi-node cluster? >>> >>> Thank You >>> >> >
