I would go with one task manager with 48 slots per machine. This reduces the communication overheads between task managers.
Regarding memory configuration: Given that the machines have plenty of memory, I would configure a bigger heap than the 4 GB you had previously. Furhermore, you can also consider adding more network buffers, which should improve job throughput. – Ufuk On Tue, Feb 23, 2016 at 11:57 AM, Welly Tambunan <if05...@gmail.com> wrote: > Hi Ufuk and Fabian, > > Is that better to start 48 task manager ( one slot each ) in one machine > than having single task manager with 48 slot ? Any trade-off that we should > know etc ? > > Cheers > > On Tue, Feb 23, 2016 at 3:03 PM, Welly Tambunan <if05...@gmail.com> wrote: >> >> Hi Ufuk, >> >> Thanks for the explanation. >> >> Yes. Our jobs is all streaming job. >> >> Cheers >> >> On Tue, Feb 23, 2016 at 2:48 PM, Ufuk Celebi <u...@apache.org> wrote: >>> >>> The new default is equivalent to the previous "streaming mode". The >>> community decided to get rid of this distinction, because it was >>> confusing to users. >>> >>> The difference between "streaming mode" and "batch mode" was how >>> Flink's managed memory was allocated, either lazily when required >>> ('streaming mode") or eagerly on task manager start up ("batch mode"). >>> Now it's lazy by default. >>> >>> This is not something you need to worry about, but if you are mostly >>> using the DataSet API where pre allocation has benefits, you can get >>> the "batch mode" behaviour by using the following configuration key: >>> >>> taskmanager.memory.preallocate: true >>> >>> But you are using the DataStream API anyways, right? >>> >>> – Ufuk >>> >>> >>> On Tue, Feb 23, 2016 at 6:36 AM, Welly Tambunan <if05...@gmail.com> >>> wrote: >>> > Hi Fabian, >>> > >>> > Previously when using flink 0.9-0.10 we start the cluster with >>> > streaming >>> > mode or batch mode. I see that this one is gone on Flink 1.00 snapshot >>> > ? So >>> > this one has already taken care of the flink and optimize by runtime > >>> > >>> > On Mon, Feb 22, 2016 at 5:26 PM, Fabian Hueske <fhue...@gmail.com> >>> > wrote: >>> >> >>> >> Hi Welly, >>> >> >>> >> sorry for the late response. >>> >> >>> >> The number of network buffers primarily depends on the maximum >>> >> parallelism >>> >> of your job. >>> >> The given formula assumes a specific cluster configuration (1 task >>> >> manager >>> >> per machine, one parallel task per CPU). >>> >> The formula can be translated to: >>> >> >>> >> taskmanager.network.numberOfBuffers: p ^ 2 * t * 4 >>> >> >>> >> where p is the maximum parallelism of the job and t is the number of >>> >> task >>> >> manager. >>> >> You can process more than one parallel task per TM if you configure >>> >> more >>> >> than one processing slot per machine ( taskmanager.numberOfTaskSlots). >>> >> The >>> >> TM will divide its memory among all its slots. So it would be possible >>> >> to >>> >> start one TM for each machine with 100GB+ memory and 48 slots each. >>> >> >>> >> We can compute the number of network buffers if you give a few more >>> >> details about your setup: >>> >> - How many task managers do you start? I assume more than one TM per >>> >> machine given that you assign only 4GB of memory out of 128GB to each >>> >> TM. >>> >> - What is the maximum parallelism of you program? >>> >> - How many processing slots do you configure for each TM? >>> >> >>> >> In general, pipelined shuffles with a high parallelism require a lot >>> >> of >>> >> memory. >>> >> If you configure batch instead of pipelined transfer, the memory >>> >> requirement goes down >>> >> (ExecutionConfig.setExecutionMode(ExecutionMode.BATCH)). >>> >> >>> >> Eventually, we want to merge the network buffer and the managed memory >>> >> pools. So the "taskmanager.network.numberOfBuffers" configuration >>> >> whill >>> >> hopefully disappear at some point in the future. >>> >> >>> >> Best, Fabian >>> >> >>> >> 2016-02-19 9:34 GMT+01:00 Welly Tambunan <if05...@gmail.com>: >>> >>> >>> >>> Hi All, >>> >>> >>> >>> We are trying to running our job in cluster that has this information >>> >>> >>> >>> 1. # of machine: 16 >>> >>> 2. memory : 128 gb >>> >>> 3. # of core : 48 >>> >>> >>> >>> However when we try to run we have an exception. >>> >>> >>> >>> "insufficient number of network buffers. 48 required but only 10 >>> >>> available. the total number of network buffers is currently set to >>> >>> 2048" >>> >>> >>> >>> After looking at the documentation we set configuration based on docs >>> >>> >>> >>> taskmanager.network.numberOfBuffers: # core ^ 2 * # machine * 4 >>> >>> >>> >>> However we face another error from JVM >>> >>> >>> >>> java.io.IOException: Cannot allocate network buffer pool: Could not >>> >>> allocate enough memory segments for NetworkBufferPool (required (Mb): >>> >>> 2304, >>> >>> allocated (Mb): 698, missing (Mb): 1606). Cause: Java heap space >>> >>> >>> >>> We fiddle the taskmanager.heap.mb: 4096 >>> >>> >>> >>> Finally the cluster is running. >>> >>> >>> >>> However i'm still not sure about the configuration and fiddling in >>> >>> task >>> >>> manager heap really fine tune. So my question is >>> >>> >>> >>> Am i doing it right for numberOfBuffers ? >>> >>> How much should we allocate on taskmanager.heap.mb given the >>> >>> information >>> >>> Any suggestion which configuration we need to set to make it optimal >>> >>> for >>> >>> the cluster ? >>> >>> Is there any chance that this will get automatically resolve by >>> >>> memory/network buffer manager ? >>> >>> >>> >>> Thanks a lot for the help >>> >>> >>> >>> Cheers >>> >>> >>> >>> -- >>> >>> Welly Tambunan >>> >>> Triplelands >>> >>> >>> >>> http://weltam.wordpress.com >>> >>> http://www.triplelands.com >>> >> >>> >> >>> > >>> > >>> > >>> > -- >>> > Welly Tambunan >>> > Triplelands >>> > >>> > http://weltam.wordpress.com >>> > http://www.triplelands.com >> >> >> >> >> -- >> Welly Tambunan >> Triplelands >> >> http://weltam.wordpress.com >> http://www.triplelands.com > > > > > -- > Welly Tambunan > Triplelands > > http://weltam.wordpress.com > http://www.triplelands.com