Re: One TaskManager per node or multiple TaskManager per node

Jamie Grier Tue, 15 Jan 2019 10:48:47 -0800

Ethan, it depends on what you mean by easy ;)  It just depends a lot on
what infra tools you already have in place.  On bare metal it's probably
safe to say there is no "easy" way.  You need a lot of automation to make
it easy.


Bastien, IMO, #1 applies to batch jobs as well.

On Tue, Jan 15, 2019 at 6:27 AM bastien dine <bastien.d...@gmail.com> wrote:

> Hello Jamie,
>
> Does #1 apply to batch jobs too ?
>
> Regards,
>
> ------------------
>
> Bastien DINE
> Data Architect / Software Engineer / Sysadmin
> bastiendine.io
>
>
> Le lun. 14 janv. 2019 à 20:39, Jamie Grier <jgr...@lyft.com> a écrit :
>
>> There are a lot of different ways to deploy Flink.  It would be easier to
>> answer your question with a little more context about your use case but in
>> general I would advocate the following:
>>
>> 1) Don't run a "permanent" Flink cluster and then submit jobs to it.
>> Instead what you should do is run an "ephemeral" cluster per job if
>> possible.  This keeps jobs completely isolated from each other which helps
>> a lot with understanding performance, debugging, looking at logs, etc.
>> 2) Given that you can do #1 and you are running on bare metal (as opposed
>> to in containers) then run one TM per physical machine.
>>
>> There are many ways to accomplish the above depending on your deployment
>> infrastructure (YARN, K8S, bare metal, VMs, etc) so it's hard to give
>> detailed input but general you'll have the best luck if you don't run
>> multiple jobs in the same TM/JVM.
>>
>> In terms of the TM memory usage you can set that up by configuring it in
>> the flink-conf.yaml file.  The config key you are looking or is
>> taskmanager.heap.size:
>> https://ci.apache.org/projects/flink/flink-docs-release-1.7/ops/config.html#taskmanager-heap-size
>>
>>
>> On Mon, Jan 14, 2019 at 8:05 AM Ethan Li <ethanopensou...@gmail.com>
>> wrote:
>>
>>> Hello,
>>>
>>> I am setting up a standalone flink cluster and I am wondering what’s the
>>> best way to distribute TaskManagers.  Do we usually launch one TaskManager
>>> (with many slots) per node or multiple TaskManagers per node (with smaller
>>> number of slots per tm) ?  Also with one TaskManager per node, I am seeing
>>> that TM launches with only 30GB JVM heap by default while the node has 180
>>> GB. Why is it not launching with more memory since there is a lot
>>> available?
>>>
>>> Thank you very much!
>>>
>>> - Ethan
>>
>>

Re: One TaskManager per node or multiple TaskManager per node

Reply via email to