Re: Modifying start-cluster scripts to efficiently spawn multiple TMs

Greg Hogan Mon, 11 Jul 2016 13:54:11 -0700

pdsh is only used for starting taskmanagers. How did you work around this?
You are able to passwordless-ssh to the jobmanager?


The error looks to be from config.sh:318 in rotateLogFile. The way we
generate the taskmanager index assumes that taskmanagers are started
sequentially (flink-daemon.sh:108).

On Mon, Jul 11, 2016 at 2:59 PM, Saliya Ekanayake <esal...@gmail.com> wrote:

> Looking at what happens with pdsh, there are two things that go wrong.
>
> 1. pdsh is installed in a node other than where the job manager would run,
> so invoking *start-cluster *from there does not spawn a job manager. Only
> if I do start-cluster from the node I specify as the job manager's node
> that it'll be created.
>
> 2. If the slaves file has the same IP more than once, then the following
> error happens trying to move log files. For example I had node j-029
> specified twice in my slaves file.
>
> j-020: mv: cannot move
> `/N/u/sekanaya/sali/software/flink-1.0.3/log/flink-sekanaya-taskmanager-26-j-020.log'
> to
> `/N/u/sekanaya/sali/software/flink-1.0.3/log/flink-sekanaya-taskmanager-26-j-020.log.1':
> No such file or directory
> j-020: mv: cannot move
> `/N/u/sekanaya/sali/software/flink-1.0.3/log/flink-sekanaya-taskmanager-26-j-020.out'
> to
> `/N/u/sekanaya/sali/software/flink-1.0.3/log/flink-sekanaya-taskmanager-26-j-020.out.1':
> No such file or directory
>
>
> On Mon, Jul 11, 2016 at 12:19 PM, Saliya Ekanayake <esal...@gmail.com>
> wrote:
>
>> I meant, I'll check when current jobs are done and will let you know.
>>
>> On Mon, Jul 11, 2016 at 12:19 PM, Saliya Ekanayake <esal...@gmail.com>
>> wrote:
>>
>>> I am running some jobs now. I'll stop and restart using pdsh to see what
>>> was the issue again
>>>
>>> On Mon, Jul 11, 2016 at 12:15 PM, Greg Hogan <c...@greghogan.com> wrote:
>>>
>>>> I'd definitely be interested to hear any insight into what failed when
>>>> starting the taskmanagers with pdsh. Did the command fail, or fallback to
>>>> standard ssh, a parse error on the slaves file?
>>>>
>>>> I'm wondering if we need to escape
>>>>   PDSH_SSH_ARGS_APPEND=$FLINK_SSH_OPTS
>>>> as
>>>>   PDSH_SSH_ARGS_APPEND="${FLINK_SSH_OPTS}"
>>>>
>>>> On Mon, Jul 11, 2016 at 12:02 AM, Saliya Ekanayake <esal...@gmail.com>
>>>> wrote:
>>>>
>>>>> pdsh is available in head node only, but when I tried to do *start-cluster
>>>>> *from head node (note Job manager node is not head node) it didn't
>>>>> work, which is why I modified the scripts.
>>>>>
>>>>> Yes, exactly, this is what I was trying to do. My research area has
>>>>> been on these NUMA related issues and binding a process to a socket (CPU)
>>>>> and then its thread to individual cores have shown great advantage. I
>>>>> actually have Java code that automatically (user configurable as well) 
>>>>> bind
>>>>> processes and threads. For Flink, I've manually done this using  shell
>>>>> script that scans TMs in a node and pin them appropriately. This approach
>>>>> is OK, but it's better if the support is integrated to Flink.
>>>>>
>>>>> On Sun, Jul 10, 2016 at 8:33 PM, Greg Hogan <c...@greghogan.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Saliya,
>>>>>>
>>>>>> Would you happen to have pdsh (parallel distributed shell) installed?
>>>>>> If so the TaskManager startup in start-cluster.sh will run in parallel.
>>>>>>
>>>>>> As to running 24 TaskManagers together, are these running across
>>>>>> multiple NUMA nodes? I had filed FLINK-3163 (
>>>>>> https://issues.apache.org/jira/browse/FLINK-3163) last year as I
>>>>>> have seen that even with only two NUMA nodes performance is improved by
>>>>>> binding TaskManagers, both memory and CPU. I think we can improve
>>>>>> configuration of task slots as we do with memory, where the latter can 
>>>>>> be a
>>>>>> fixed measure or a fraction relative to total memory.
>>>>>>
>>>>>> Greg
>>>>>>
>>>>>> On Sat, Jul 9, 2016 at 3:44 AM, Saliya Ekanayake <esal...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> The current start/stop scripts SSH worker nodes each time they
>>>>>>> appear in the slaves file. When spawning multiple TMs (like 24 per 
>>>>>>> node),
>>>>>>> this is very inefficient.
>>>>>>>
>>>>>>> I've changed the scripts to do one SSH per node and spawn a given N
>>>>>>> number of TMs afterwards. I can make a pull request if this seems 
>>>>>>> usable to
>>>>>>> others. For now, I assume slaves file will indicate the number of TMs 
>>>>>>> per
>>>>>>> slave in "IP N" format.
>>>>>>>
>>>>>>> Thank you,
>>>>>>> Saliya
>>>>>>>
>>>>>>> --
>>>>>>> Saliya Ekanayake
>>>>>>> Ph.D. Candidate | Research Assistant
>>>>>>> School of Informatics and Computing | Digital Science Center
>>>>>>> Indiana University, Bloomington
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Saliya Ekanayake
>>>>> Ph.D. Candidate | Research Assistant
>>>>> School of Informatics and Computing | Digital Science Center
>>>>> Indiana University, Bloomington
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Saliya Ekanayake
>>> Ph.D. Candidate | Research Assistant
>>> School of Informatics and Computing | Digital Science Center
>>> Indiana University, Bloomington
>>>
>>>
>>
>>
>> --
>> Saliya Ekanayake
>> Ph.D. Candidate | Research Assistant
>> School of Informatics and Computing | Digital Science Center
>> Indiana University, Bloomington
>>
>>
>
>
> --
> Saliya Ekanayake
> Ph.D. Candidate | Research Assistant
> School of Informatics and Computing | Digital Science Center
> Indiana University, Bloomington
>
>

Re: Modifying start-cluster scripts to efficiently spawn multiple TMs

Reply via email to