One related question. Is there any way to automatically determine the
optimal # of workers in yarn based on the data size, and available
resources without explicitly specifying it when the job is lunched?

Thanks.

Sincerely,

DB Tsai
Machine Learning Engineer
Alpine Data Labs
--------------------------------------
Web: http://alpinenow.com/


On Wed, Mar 12, 2014 at 2:50 PM, Patrick Wendell <pwend...@gmail.com> wrote:
> Hey Pierre,
>
> Currently modifying the "slaves" file is the best way to do this
> because in general we expect that users will want to launch workers on
> any slave.
>
> I think you could hack something together pretty easily to allow this.
> For instance if you modify the line in slaves.sh from this:
>
>   for slave in `cat "$HOSTLIST"|sed  "s/#.*$//;/^$/d"`; do
>
> to this
>
>   for slave in `cat "$HOSTLIST"| head -n $NUM_SLAVES | sed
> "s/#.*$//;/^$/d"`; do
>
> Then you could just set NUM_SLAVES before you stop/start. Not sure if
> this helps much but maybe it's a bit faster.
>
> - Patrick
>
> On Wed, Mar 12, 2014 at 10:18 AM, Pierre Borckmans
> <pierre.borckm...@realimpactanalytics.com> wrote:
>> Hi there!
>>
>> I was performing some tests for benchmarking purposes, among other things to 
>> observe the evolution of the performances versus the number of workers.
>>
>> In that context, I was wondering if there is any easy way to choose the 
>> number of workers to be used in standalone mode, without having to change 
>> the "slaves" file, dispatch it, and restart the cluster ?
>>
>>
>> Cheers,
>>
>> Pierre

Reply via email to