Re: All but one TMs connect when JM has more than 16G of memory

Robert Schmidtke Thu, 01 Oct 2015 02:46:12 -0700

I see, thanks for the info. I only have access to my cluster via SLURM and
we don't have ssh between our nodes which is why I haven't really
considered the Standalone mode. A colleague has set up YARN on SLURM and it
was just the easiest to use. I briefly looked into the Flink Standalone
mode but dropped it because I thought YARN would be possible after all. It
seems I'm going to have a deeper look into starting the master and slaves
with SLURM's srun instead of ssh (I guess a slight modification of
start-cluster.sh should do the job).


On Thu, Oct 1, 2015 at 11:30 AM, Robert Metzger <rmetz...@apache.org> wrote:

> Hi,
> there is currently no option for forcing certain containers onto specific
> machines.
> For running the JM (or any other YARN container) on the AM host, you first
> need to have a NodeManager running on the host with the RM. Maybe YARN is
> smart enough to schedule the small JM container onto that machine.
>
> I don't know your exact setup, but maybe it would make sense for you to
> run Flink in the standalone cluster mode instead with YARN. It seems that
> you have a very good idea how and where you want to run the Flink services
> in your cluster. YARN is designed to be an abstraction between the cluster
> and the application, that's why its a bit difficult to schedule the
> containers to specific machines.
>
> Robert
>
>
>
> On Thu, Oct 1, 2015 at 11:24 AM, Robert Schmidtke <ro.schmid...@gmail.com>
> wrote:
>
>> Hi Robert,
>>
>> I had a job failure yesterday with what I believe is the setup I have
>> described above. However when trying to reproduce now, the behavior is the
>> same: Flink waiting for resources to become available. So no hard error.
>>
>> Ok, the looping makes sense then. I haven't thought about shared setups.
>> I'm still figuring out how all parameters play together, i.e. -yn, -yjm,
>> -ytm and the memory limits in yarn-site.xml. This will need some testing
>> and I'll come back with a proper description once I think I know what's
>> going on.
>>
>> When running Flink on YARN, is it easily possible to place the Flink JM
>> where the YARN Resource Manager sits, and all the TMs with the remaining
>> Node Managers?
>>
>> Robert
>>
>> On Thu, Oct 1, 2015 at 10:53 AM, Robert Metzger <rmetz...@apache.org>
>> wrote:
>>
>>> Hi,
>>>
>>> It is interesting to note that when I set both 
>>> yarn.nodemanager.resource.memory-mb
>>>> and yarn.scheduler.maximum-allocation-mb to 56G I get a proper error
>>>> when requesting 56G and 1M, but when setting 
>>>> yarn.nodemanager.resource.memory-mb
>>>> to 56G and yarn.scheduler.maximum-allocation-mb to 54G I don't get an
>>>> error but the aforementioned endless loop.
>>>
>>>
>>> is it a "hard error" (failing) you're getting or just "WARN" log
>>> messages. I'm asking because I've added some code some time ago to do some
>>> checks before deploying Flink on YARN. These checks will print WARN log
>>> messages if the requested YARN session/job does not fit onto the cluster.
>>> This "endless loop" exists because in many production environments Flink
>>> can just wait for resources to become available, for example when other
>>> containers are finishing.
>>>
>>>
>>> Robert
>>>
>>> On Wed, Sep 30, 2015 at 6:33 PM, Robert Schmidtke <
>>> ro.schmid...@gmail.com> wrote:
>>>
>>>> Hi Robert,
>>>>
>>>> thanks for your reply. It got me digging into my setup and I discovered
>>>> that one TM was scheduled next to the JM. When specifying -yn 7 the
>>>> documentation suggests that this is the number of TMs (of which I wanted
>>>> 7), and I thought an additional container would be used for the JM (my YARN
>>>> cluster has 8 containers). Anyway with this setup the memory added up to
>>>> 56G and 1M (40G per TM and 16G 1M for the JM), but I set a hard maximum of
>>>> 56G in my yarn-site.xml which is why the request could not be fulfilled. It
>>>> is interesting to note that when I set
>>>> both yarn.nodemanager.resource.memory-mb
>>>> and yarn.scheduler.maximum-allocation-mb to 56G I get a proper error when
>>>> requesting 56G and 1M, but when setting yarn.nodemanager.resource.memory-mb
>>>> to 56G and yarn.scheduler.maximum-allocation-mb to 54G I don't get an error
>>>> but the aforementioned endless loop. Note I
>>>> have yarn.nodemanager.vmem-check-enabled set to false. This is probably a
>>>> YARN issue then / my bad configuration.
>>>>
>>>> I'm in a rush now (to get to the Flink meetup) and thus will check the
>>>> documentation later to see how to deploy the TMs and JM on separate
>>>> machines each, since that is not what's happening at the moment, but this
>>>> is what I'd like to have. Thanks again and see you in an hour.
>>>>
>>>> Cheers
>>>> Robert
>>>>
>>>> On Wed, Sep 30, 2015 at 5:19 PM, Robert Metzger <rmetz...@apache.org>
>>>> wrote:
>>>>
>>>>> Hi Robert,
>>>>>
>>>>> the problem here is that YARN's scheduler (there are different
>>>>> schedulers in YARN: FIFO, CapacityScheduler, ...) is not giving Flink's
>>>>> ApplicationMaster/JobManager all the containers it is requesting. By
>>>>> increasing the size of the AM/JM container, there is probably no memory
>>>>> left to fit the last TaskManager container.
>>>>> I also experienced this issue, when I wanted to run a Flink job on
>>>>> YARN and the containers were fitting theoretically, but YARN was not 
>>>>> giving
>>>>> me all the containers I requested.
>>>>> Back then, I asked on the yarn-dev list [1] (there were also some
>>>>> off-list emails) but we could not resolve the issue.
>>>>>
>>>>> Can you check the resource manager logs? Maybe there is a log message
>>>>> which explains why the container request of Flink's AM is not fulfilled.
>>>>>
>>>>>
>>>>> [1]
>>>>> http://search-hadoop.com/m/AsBtCilK5r1pKLjf1&subj=Re+QUESTION+Allocating+a+full+YARN+cluster
>>>>>
>>>>> On Wed, Sep 30, 2015 at 5:02 PM, Robert Schmidtke <
>>>>> ro.schmid...@gmail.com> wrote:
>>>>>
>>>>>> It's me again. This is a strange issue, I hope I managed to find the
>>>>>> right keywords. I got 8 machines, 1 for the JM, the other 7 are TMs with
>>>>>> 64G of memory each.
>>>>>>
>>>>>> When running my job like so:
>>>>>>
>>>>>> $FLINK_HOME/bin/flink run -m yarn-cluster -yjm 16384 -ytm 40960 -yn 7
>>>>>> .....
>>>>>>
>>>>>> The job completes without any problems. When running it like so:
>>>>>>
>>>>>> $FLINK_HOME/bin/flink run -m yarn-cluster -yjm 16385 -ytm 40960 -yn 7
>>>>>> .....
>>>>>>
>>>>>> (note the one more M of memory for the JM), the execution stalls,
>>>>>> continuously reporting:
>>>>>>
>>>>>> .....
>>>>>> TaskManager status (6/7)
>>>>>> TaskManager status (6/7)
>>>>>> TaskManager status (6/7)
>>>>>> .....
>>>>>>
>>>>>> I did some poking around, but I couldn't find any direct correlation
>>>>>> with the code.
>>>>>>
>>>>>> The JM log says:
>>>>>>
>>>>>> .....
>>>>>> 16:49:01,893 INFO  org.apache.flink.yarn.ApplicationMaster$
>>>>>>            -  JVM Options:
>>>>>> 16:49:01,893 INFO  org.apache.flink.yarn.ApplicationMaster$
>>>>>>            -     -Xmx12289M
>>>>>> .....
>>>>>>
>>>>>> but then continues to report
>>>>>>
>>>>>> .....
>>>>>> 16:52:59,311 INFO
>>>>>>  org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1    - The user
>>>>>> requested 7 containers, 6 running. 1 containers missing
>>>>>> 16:52:59,831 INFO
>>>>>>  org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1    - The user
>>>>>> requested 7 containers, 6 running. 1 containers missing
>>>>>> 16:53:00,351 INFO
>>>>>>  org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1    - The user
>>>>>> requested 7 containers, 6 running. 1 containers missing
>>>>>> .....
>>>>>>
>>>>>> forever until I cancel the job.
>>>>>>
>>>>>> If you have any ideas I'm happy to try them out. Thanks in advance
>>>>>> for any hints! Cheers.
>>>>>>
>>>>>> Robert
>>>>>> --
>>>>>> My GPG Key ID: 336E2680
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> My GPG Key ID: 336E2680
>>>>
>>>
>>>
>>
>>
>> --
>> My GPG Key ID: 336E2680
>>
>
>


-- 
My GPG Key ID: 336E2680

Re: All but one TMs connect when JM has more than 16G of memory

Reply via email to