Re: All but one TMs connect when JM has more than 16G of memory

Robert Metzger Thu, 01 Oct 2015 02:52:12 -0700

Feel free to contribute a documentation to Flink on how to run Flink on
SLURM.


On Thu, Oct 1, 2015 at 11:45 AM, Robert Schmidtke <ro.schmid...@gmail.com>
wrote:

> I see, thanks for the info. I only have access to my cluster via SLURM and
> we don't have ssh between our nodes which is why I haven't really
> considered the Standalone mode. A colleague has set up YARN on SLURM and it
> was just the easiest to use. I briefly looked into the Flink Standalone
> mode but dropped it because I thought YARN would be possible after all. It
> seems I'm going to have a deeper look into starting the master and slaves
> with SLURM's srun instead of ssh (I guess a slight modification of
> start-cluster.sh should do the job).
>
> On Thu, Oct 1, 2015 at 11:30 AM, Robert Metzger <rmetz...@apache.org>
> wrote:
>
>> Hi,
>> there is currently no option for forcing certain containers onto specific
>> machines.
>> For running the JM (or any other YARN container) on the AM host, you
>> first need to have a NodeManager running on the host with the RM. Maybe
>> YARN is smart enough to schedule the small JM container onto that machine.
>>
>> I don't know your exact setup, but maybe it would make sense for you to
>> run Flink in the standalone cluster mode instead with YARN. It seems that
>> you have a very good idea how and where you want to run the Flink services
>> in your cluster. YARN is designed to be an abstraction between the cluster
>> and the application, that's why its a bit difficult to schedule the
>> containers to specific machines.
>>
>> Robert
>>
>>
>>
>> On Thu, Oct 1, 2015 at 11:24 AM, Robert Schmidtke <ro.schmid...@gmail.com
>> > wrote:
>>
>>> Hi Robert,
>>>
>>> I had a job failure yesterday with what I believe is the setup I have
>>> described above. However when trying to reproduce now, the behavior is the
>>> same: Flink waiting for resources to become available. So no hard error.
>>>
>>> Ok, the looping makes sense then. I haven't thought about shared setups.
>>> I'm still figuring out how all parameters play together, i.e. -yn, -yjm,
>>> -ytm and the memory limits in yarn-site.xml. This will need some testing
>>> and I'll come back with a proper description once I think I know what's
>>> going on.
>>>
>>> When running Flink on YARN, is it easily possible to place the Flink JM
>>> where the YARN Resource Manager sits, and all the TMs with the remaining
>>> Node Managers?
>>>
>>> Robert
>>>
>>> On Thu, Oct 1, 2015 at 10:53 AM, Robert Metzger <rmetz...@apache.org>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> It is interesting to note that when I set both 
>>>> yarn.nodemanager.resource.memory-mb
>>>>> and yarn.scheduler.maximum-allocation-mb to 56G I get a proper error
>>>>> when requesting 56G and 1M, but when setting 
>>>>> yarn.nodemanager.resource.memory-mb
>>>>> to 56G and yarn.scheduler.maximum-allocation-mb to 54G I don't get an
>>>>> error but the aforementioned endless loop.
>>>>
>>>>
>>>> is it a "hard error" (failing) you're getting or just "WARN" log
>>>> messages. I'm asking because I've added some code some time ago to do some
>>>> checks before deploying Flink on YARN. These checks will print WARN log
>>>> messages if the requested YARN session/job does not fit onto the cluster.
>>>> This "endless loop" exists because in many production environments
>>>> Flink can just wait for resources to become available, for example when
>>>> other containers are finishing.
>>>>
>>>>
>>>> Robert
>>>>
>>>> On Wed, Sep 30, 2015 at 6:33 PM, Robert Schmidtke <
>>>> ro.schmid...@gmail.com> wrote:
>>>>
>>>>> Hi Robert,
>>>>>
>>>>> thanks for your reply. It got me digging into my setup and I
>>>>> discovered that one TM was scheduled next to the JM. When specifying -yn 7
>>>>> the documentation suggests that this is the number of TMs (of which I
>>>>> wanted 7), and I thought an additional container would be used for the JM
>>>>> (my YARN cluster has 8 containers). Anyway with this setup the memory 
>>>>> added
>>>>> up to 56G and 1M (40G per TM and 16G 1M for the JM), but I set a hard
>>>>> maximum of 56G in my yarn-site.xml which is why the request could not be
>>>>> fulfilled. It is interesting to note that when I set
>>>>> both yarn.nodemanager.resource.memory-mb
>>>>> and yarn.scheduler.maximum-allocation-mb to 56G I get a proper error when
>>>>> requesting 56G and 1M, but when setting 
>>>>> yarn.nodemanager.resource.memory-mb
>>>>> to 56G and yarn.scheduler.maximum-allocation-mb to 54G I don't get an 
>>>>> error
>>>>> but the aforementioned endless loop. Note I
>>>>> have yarn.nodemanager.vmem-check-enabled set to false. This is probably a
>>>>> YARN issue then / my bad configuration.
>>>>>
>>>>> I'm in a rush now (to get to the Flink meetup) and thus will check the
>>>>> documentation later to see how to deploy the TMs and JM on separate
>>>>> machines each, since that is not what's happening at the moment, but this
>>>>> is what I'd like to have. Thanks again and see you in an hour.
>>>>>
>>>>> Cheers
>>>>> Robert
>>>>>
>>>>> On Wed, Sep 30, 2015 at 5:19 PM, Robert Metzger <rmetz...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> Hi Robert,
>>>>>>
>>>>>> the problem here is that YARN's scheduler (there are different
>>>>>> schedulers in YARN: FIFO, CapacityScheduler, ...) is not giving Flink's
>>>>>> ApplicationMaster/JobManager all the containers it is requesting. By
>>>>>> increasing the size of the AM/JM container, there is probably no memory
>>>>>> left to fit the last TaskManager container.
>>>>>> I also experienced this issue, when I wanted to run a Flink job on
>>>>>> YARN and the containers were fitting theoretically, but YARN was not 
>>>>>> giving
>>>>>> me all the containers I requested.
>>>>>> Back then, I asked on the yarn-dev list [1] (there were also some
>>>>>> off-list emails) but we could not resolve the issue.
>>>>>>
>>>>>> Can you check the resource manager logs? Maybe there is a log message
>>>>>> which explains why the container request of Flink's AM is not fulfilled.
>>>>>>
>>>>>>
>>>>>> [1]
>>>>>> http://search-hadoop.com/m/AsBtCilK5r1pKLjf1&subj=Re+QUESTION+Allocating+a+full+YARN+cluster
>>>>>>
>>>>>> On Wed, Sep 30, 2015 at 5:02 PM, Robert Schmidtke <
>>>>>> ro.schmid...@gmail.com> wrote:
>>>>>>
>>>>>>> It's me again. This is a strange issue, I hope I managed to find the
>>>>>>> right keywords. I got 8 machines, 1 for the JM, the other 7 are TMs with
>>>>>>> 64G of memory each.
>>>>>>>
>>>>>>> When running my job like so:
>>>>>>>
>>>>>>> $FLINK_HOME/bin/flink run -m yarn-cluster -yjm 16384 -ytm 40960 -yn
>>>>>>> 7 .....
>>>>>>>
>>>>>>> The job completes without any problems. When running it like so:
>>>>>>>
>>>>>>> $FLINK_HOME/bin/flink run -m yarn-cluster -yjm 16385 -ytm 40960 -yn
>>>>>>> 7 .....
>>>>>>>
>>>>>>> (note the one more M of memory for the JM), the execution stalls,
>>>>>>> continuously reporting:
>>>>>>>
>>>>>>> .....
>>>>>>> TaskManager status (6/7)
>>>>>>> TaskManager status (6/7)
>>>>>>> TaskManager status (6/7)
>>>>>>> .....
>>>>>>>
>>>>>>> I did some poking around, but I couldn't find any direct correlation
>>>>>>> with the code.
>>>>>>>
>>>>>>> The JM log says:
>>>>>>>
>>>>>>> .....
>>>>>>> 16:49:01,893 INFO  org.apache.flink.yarn.ApplicationMaster$
>>>>>>>              -  JVM Options:
>>>>>>> 16:49:01,893 INFO  org.apache.flink.yarn.ApplicationMaster$
>>>>>>>              -     -Xmx12289M
>>>>>>> .....
>>>>>>>
>>>>>>> but then continues to report
>>>>>>>
>>>>>>> .....
>>>>>>> 16:52:59,311 INFO
>>>>>>>  org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1    - The 
>>>>>>> user
>>>>>>> requested 7 containers, 6 running. 1 containers missing
>>>>>>> 16:52:59,831 INFO
>>>>>>>  org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1    - The 
>>>>>>> user
>>>>>>> requested 7 containers, 6 running. 1 containers missing
>>>>>>> 16:53:00,351 INFO
>>>>>>>  org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1    - The 
>>>>>>> user
>>>>>>> requested 7 containers, 6 running. 1 containers missing
>>>>>>> .....
>>>>>>>
>>>>>>> forever until I cancel the job.
>>>>>>>
>>>>>>> If you have any ideas I'm happy to try them out. Thanks in advance
>>>>>>> for any hints! Cheers.
>>>>>>>
>>>>>>> Robert
>>>>>>> --
>>>>>>> My GPG Key ID: 336E2680
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> My GPG Key ID: 336E2680
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> My GPG Key ID: 336E2680
>>>
>>
>>
>
>
> --
> My GPG Key ID: 336E2680
>

Re: All but one TMs connect when JM has more than 16G of memory

Reply via email to