Re: Modifying start-cluster scripts to efficiently spawn multiple TMs

Saliya Ekanayake Mon, 11 Jul 2016 14:09:06 -0700

Yes, I've password-less SSH to the job manager node.

On Mon, Jul 11, 2016 at 4:53 PM, Greg Hogan <c...@greghogan.com> wrote:


> pdsh is only used for starting taskmanagers. How did you work around this?
> You are able to passwordless-ssh to the jobmanager?
>
> The error looks to be from config.sh:318 in rotateLogFile. The way we
> generate the taskmanager index assumes that taskmanagers are started
> sequentially (flink-daemon.sh:108).
>
> On Mon, Jul 11, 2016 at 2:59 PM, Saliya Ekanayake <esal...@gmail.com>
> wrote:
>
>> Looking at what happens with pdsh, there are two things that go wrong.
>>
>> 1. pdsh is installed in a node other than where the job manager would
>> run, so invoking *start-cluster *from there does not spawn a job
>> manager. Only if I do start-cluster from the node I specify as the job
>> manager's node that it'll be created.
>>
>> 2. If the slaves file has the same IP more than once, then the following
>> error happens trying to move log files. For example I had node j-029
>> specified twice in my slaves file.
>>
>> j-020: mv: cannot move
>> `/N/u/sekanaya/sali/software/flink-1.0.3/log/flink-sekanaya-taskmanager-26-j-020.log'
>> to
>> `/N/u/sekanaya/sali/software/flink-1.0.3/log/flink-sekanaya-taskmanager-26-j-020.log.1':
>> No such file or directory
>> j-020: mv: cannot move
>> `/N/u/sekanaya/sali/software/flink-1.0.3/log/flink-sekanaya-taskmanager-26-j-020.out'
>> to
>> `/N/u/sekanaya/sali/software/flink-1.0.3/log/flink-sekanaya-taskmanager-26-j-020.out.1':
>> No such file or directory
>>
>>
>> On Mon, Jul 11, 2016 at 12:19 PM, Saliya Ekanayake <esal...@gmail.com>
>> wrote:
>>
>>> I meant, I'll check when current jobs are done and will let you know.
>>>
>>> On Mon, Jul 11, 2016 at 12:19 PM, Saliya Ekanayake <esal...@gmail.com>
>>> wrote:
>>>
>>>> I am running some jobs now. I'll stop and restart using pdsh to see
>>>> what was the issue again
>>>>
>>>> On Mon, Jul 11, 2016 at 12:15 PM, Greg Hogan <c...@greghogan.com>
>>>> wrote:
>>>>
>>>>> I'd definitely be interested to hear any insight into what failed when
>>>>> starting the taskmanagers with pdsh. Did the command fail, or fallback to
>>>>> standard ssh, a parse error on the slaves file?
>>>>>
>>>>> I'm wondering if we need to escape
>>>>>   PDSH_SSH_ARGS_APPEND=$FLINK_SSH_OPTS
>>>>> as
>>>>>   PDSH_SSH_ARGS_APPEND="${FLINK_SSH_OPTS}"
>>>>>
>>>>> On Mon, Jul 11, 2016 at 12:02 AM, Saliya Ekanayake <esal...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> pdsh is available in head node only, but when I tried to do 
>>>>>> *start-cluster
>>>>>> *from head node (note Job manager node is not head node) it didn't
>>>>>> work, which is why I modified the scripts.
>>>>>>
>>>>>> Yes, exactly, this is what I was trying to do. My research area has
>>>>>> been on these NUMA related issues and binding a process to a socket (CPU)
>>>>>> and then its thread to individual cores have shown great advantage. I
>>>>>> actually have Java code that automatically (user configurable as well) 
>>>>>> bind
>>>>>> processes and threads. For Flink, I've manually done this using  shell
>>>>>> script that scans TMs in a node and pin them appropriately. This approach
>>>>>> is OK, but it's better if the support is integrated to Flink.
>>>>>>
>>>>>> On Sun, Jul 10, 2016 at 8:33 PM, Greg Hogan <c...@greghogan.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Saliya,
>>>>>>>
>>>>>>> Would you happen to have pdsh (parallel distributed shell)
>>>>>>> installed? If so the TaskManager startup in start-cluster.sh will run in
>>>>>>> parallel.
>>>>>>>
>>>>>>> As to running 24 TaskManagers together, are these running across
>>>>>>> multiple NUMA nodes? I had filed FLINK-3163 (
>>>>>>> https://issues.apache.org/jira/browse/FLINK-3163) last year as I
>>>>>>> have seen that even with only two NUMA nodes performance is improved by
>>>>>>> binding TaskManagers, both memory and CPU. I think we can improve
>>>>>>> configuration of task slots as we do with memory, where the latter can 
>>>>>>> be a
>>>>>>> fixed measure or a fraction relative to total memory.
>>>>>>>
>>>>>>> Greg
>>>>>>>
>>>>>>> On Sat, Jul 9, 2016 at 3:44 AM, Saliya Ekanayake <esal...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> The current start/stop scripts SSH worker nodes each time they
>>>>>>>> appear in the slaves file. When spawning multiple TMs (like 24 per 
>>>>>>>> node),
>>>>>>>> this is very inefficient.
>>>>>>>>
>>>>>>>> I've changed the scripts to do one SSH per node and spawn a given N
>>>>>>>> number of TMs afterwards. I can make a pull request if this seems 
>>>>>>>> usable to
>>>>>>>> others. For now, I assume slaves file will indicate the number of TMs 
>>>>>>>> per
>>>>>>>> slave in "IP N" format.
>>>>>>>>
>>>>>>>> Thank you,
>>>>>>>> Saliya
>>>>>>>>
>>>>>>>> --
>>>>>>>> Saliya Ekanayake
>>>>>>>> Ph.D. Candidate | Research Assistant
>>>>>>>> School of Informatics and Computing | Digital Science Center
>>>>>>>> Indiana University, Bloomington
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Saliya Ekanayake
>>>>>> Ph.D. Candidate | Research Assistant
>>>>>> School of Informatics and Computing | Digital Science Center
>>>>>> Indiana University, Bloomington
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Saliya Ekanayake
>>>> Ph.D. Candidate | Research Assistant
>>>> School of Informatics and Computing | Digital Science Center
>>>> Indiana University, Bloomington
>>>>
>>>>
>>>
>>>
>>> --
>>> Saliya Ekanayake
>>> Ph.D. Candidate | Research Assistant
>>> School of Informatics and Computing | Digital Science Center
>>> Indiana University, Bloomington
>>>
>>>
>>
>>
>> --
>> Saliya Ekanayake
>> Ph.D. Candidate | Research Assistant
>> School of Informatics and Computing | Digital Science Center
>> Indiana University, Bloomington
>>
>>
>


-- 
Saliya Ekanayake
Ph.D. Candidate | Research Assistant
School of Informatics and Computing | Digital Science Center
Indiana University, Bloomington

Re: Modifying start-cluster scripts to efficiently spawn multiple TMs

Reply via email to