Re: Optimal Configuration for Cluster

Ufuk Celebi Tue, 23 Feb 2016 05:05:38 -0800

I would go with one task manager with 48 slots per machine. This
reduces the communication overheads between task managers.


Regarding memory configuration: Given that the machines have plenty of
memory, I would configure a bigger heap than the 4 GB you had
previously. Furhermore, you can also consider adding more network
buffers, which should improve job throughput.

– Ufuk

On Tue, Feb 23, 2016 at 11:57 AM, Welly Tambunan <if05...@gmail.com> wrote:
> Hi Ufuk and Fabian,
>
> Is that better to start 48 task manager ( one slot each ) in one machine
> than having single task manager with 48 slot ? Any trade-off that we should
> know etc ?
>
> Cheers
>
> On Tue, Feb 23, 2016 at 3:03 PM, Welly Tambunan <if05...@gmail.com> wrote:
>>
>> Hi Ufuk,
>>
>> Thanks for the explanation.
>>
>> Yes. Our jobs is all streaming job.
>>
>> Cheers
>>
>> On Tue, Feb 23, 2016 at 2:48 PM, Ufuk Celebi <u...@apache.org> wrote:
>>>
>>> The new default is equivalent to the previous "streaming mode". The
>>> community decided to get rid of this distinction, because it was
>>> confusing to users.
>>>
>>> The difference between "streaming mode" and "batch mode" was how
>>> Flink's managed memory was allocated, either lazily when required
>>> ('streaming mode") or eagerly on task manager start up ("batch mode").
>>> Now it's lazy by default.
>>>
>>> This is not something you need to worry about, but if you are mostly
>>> using the DataSet API where pre allocation has benefits, you can get
>>> the "batch mode" behaviour by using the following configuration key:
>>>
>>> taskmanager.memory.preallocate: true
>>>
>>> But you are using the DataStream API anyways, right?
>>>
>>> – Ufuk
>>>
>>>
>>> On Tue, Feb 23, 2016 at 6:36 AM, Welly Tambunan <if05...@gmail.com>
>>> wrote:
>>> > Hi Fabian,
>>> >
>>> > Previously when using flink 0.9-0.10 we start the cluster with
>>> > streaming
>>> > mode or batch mode. I see that this one is gone on Flink 1.00 snapshot
>>> > ? So
>>> > this one has already taken care of the flink and optimize by runtime >
>>> >
>>> > On Mon, Feb 22, 2016 at 5:26 PM, Fabian Hueske <fhue...@gmail.com>
>>> > wrote:
>>> >>
>>> >> Hi Welly,
>>> >>
>>> >> sorry for the late response.
>>> >>
>>> >> The number of network buffers primarily depends on the maximum
>>> >> parallelism
>>> >> of your job.
>>> >> The given formula assumes a specific cluster configuration (1 task
>>> >> manager
>>> >> per machine, one parallel task per CPU).
>>> >> The formula can be translated to:
>>> >>
>>> >> taskmanager.network.numberOfBuffers: p ^ 2 * t * 4
>>> >>
>>> >> where p is the maximum parallelism of the job and t is the number of
>>> >> task
>>> >> manager.
>>> >> You can process more than one parallel task per TM if you configure
>>> >> more
>>> >> than one processing slot per machine ( taskmanager.numberOfTaskSlots).
>>> >> The
>>> >> TM will divide its memory among all its slots. So it would be possible
>>> >> to
>>> >> start one TM for each machine with 100GB+ memory and 48 slots each.
>>> >>
>>> >> We can compute the number of network buffers if you give a few more
>>> >> details about your setup:
>>> >> - How many task managers do you start? I assume more than one TM per
>>> >> machine given that you assign only 4GB of memory out of 128GB to each
>>> >> TM.
>>> >> - What is the maximum parallelism of you program?
>>> >> - How many processing slots do you configure for each TM?
>>> >>
>>> >> In general, pipelined shuffles with a high parallelism require a lot
>>> >> of
>>> >> memory.
>>> >> If you configure batch instead of pipelined transfer, the memory
>>> >> requirement goes down
>>> >> (ExecutionConfig.setExecutionMode(ExecutionMode.BATCH)).
>>> >>
>>> >> Eventually, we want to merge the network buffer and the managed memory
>>> >> pools. So the "taskmanager.network.numberOfBuffers" configuration
>>> >> whill
>>> >> hopefully disappear at some point in the future.
>>> >>
>>> >> Best, Fabian
>>> >>
>>> >> 2016-02-19 9:34 GMT+01:00 Welly Tambunan <if05...@gmail.com>:
>>> >>>
>>> >>> Hi All,
>>> >>>
>>> >>> We are trying to running our job in cluster that has this information
>>> >>>
>>> >>> 1. # of machine: 16
>>> >>> 2. memory : 128 gb
>>> >>> 3. # of core : 48
>>> >>>
>>> >>> However when we try to run we have an exception.
>>> >>>
>>> >>> "insufficient number of network buffers. 48 required but only 10
>>> >>> available. the total number of network buffers is currently set to
>>> >>> 2048"
>>> >>>
>>> >>> After looking at the documentation we set configuration based on docs
>>> >>>
>>> >>> taskmanager.network.numberOfBuffers: # core ^ 2 * # machine * 4
>>> >>>
>>> >>> However we face another error from JVM
>>> >>>
>>> >>> java.io.IOException: Cannot allocate network buffer pool: Could not
>>> >>> allocate enough memory segments for NetworkBufferPool (required (Mb):
>>> >>> 2304,
>>> >>> allocated (Mb): 698, missing (Mb): 1606). Cause: Java heap space
>>> >>>
>>> >>> We fiddle the taskmanager.heap.mb: 4096
>>> >>>
>>> >>> Finally the cluster is running.
>>> >>>
>>> >>> However i'm still not sure about the configuration and fiddling in
>>> >>> task
>>> >>> manager heap really fine tune. So my question is
>>> >>>
>>> >>> Am i doing it right for numberOfBuffers ?
>>> >>> How much should we allocate on taskmanager.heap.mb given the
>>> >>> information
>>> >>> Any suggestion which configuration we need to set to make it optimal
>>> >>> for
>>> >>> the cluster ?
>>> >>> Is there any chance that this will get automatically resolve by
>>> >>> memory/network buffer manager ?
>>> >>>
>>> >>> Thanks a lot for the help
>>> >>>
>>> >>> Cheers
>>> >>>
>>> >>> --
>>> >>> Welly Tambunan
>>> >>> Triplelands
>>> >>>
>>> >>> http://weltam.wordpress.com
>>> >>> http://www.triplelands.com
>>> >>
>>> >>
>>> >
>>> >
>>> >
>>> > --
>>> > Welly Tambunan
>>> > Triplelands
>>> >
>>> > http://weltam.wordpress.com
>>> > http://www.triplelands.com
>>
>>
>>
>>
>> --
>> Welly Tambunan
>> Triplelands
>>
>> http://weltam.wordpress.com
>> http://www.triplelands.com
>
>
>
>
> --
> Welly Tambunan
> Triplelands
>
> http://weltam.wordpress.com
> http://www.triplelands.com

Re: Optimal Configuration for Cluster

Reply via email to