Re: Modifying start-cluster scripts to efficiently spawn multiple TMs

Greg Hogan Sun, 10 Jul 2016 17:33:54 -0700

Hi Saliya,

Would you happen to have pdsh (parallel distributed shell) installed? If so
the TaskManager startup in start-cluster.sh will run in parallel.

As to running 24 TaskManagers together, are these running across multiple
NUMA nodes? I had filed FLINK-3163 (
https://issues.apache.org/jira/browse/FLINK-3163) last year as I have seen
that even with only two NUMA nodes performance is improved by binding
TaskManagers, both memory and CPU. I think we can improve configuration of
task slots as we do with memory, where the latter can be a fixed measure or
a fraction relative to total memory.

Greg

On Sat, Jul 9, 2016 at 3:44 AM, Saliya Ekanayake <esal...@gmail.com> wrote:

> Hi,
>
> The current start/stop scripts SSH worker nodes each time they appear in
> the slaves file. When spawning multiple TMs (like 24 per node), this is
> very inefficient.
>
> I've changed the scripts to do one SSH per node and spawn a given N number
> of TMs afterwards. I can make a pull request if this seems usable to
> others. For now, I assume slaves file will indicate the number of TMs per
> slave in "IP N" format.
>
> Thank you,
> Saliya
>
> --
> Saliya Ekanayake
> Ph.D. Candidate | Research Assistant
> School of Informatics and Computing | Digital Science Center
> Indiana University, Bloomington
>
>

Re: Modifying start-cluster scripts to efficiently spawn multiple TMs

Reply via email to