Hi Fabian, Previously when using flink 0.9-0.10 we start the cluster with streaming mode or batch mode. I see that this one is gone on Flink 1.00 snapshot ? So this one has already taken care of the flink and optimize by runtime >
On Mon, Feb 22, 2016 at 5:26 PM, Fabian Hueske <fhue...@gmail.com> wrote: > Hi Welly, > > sorry for the late response. > > The number of network buffers primarily depends on the maximum parallelism > of your job. > The given formula assumes a specific cluster configuration (1 task manager > per machine, one parallel task per CPU). > The formula can be translated to: > > taskmanager.network.numberOfBuffers: p ^ 2 * t * 4 > > where p is the maximum parallelism of the job and t is the number of task > manager. > You can process more than one parallel task per TM if you configure more > than one processing slot per machine ( taskmanager.numberOfTaskSlots). > The TM will divide its memory among all its slots. So it would be possible > to start one TM for each machine with 100GB+ memory and 48 slots each. > > We can compute the number of network buffers if you give a few more > details about your setup: > - How many task managers do you start? I assume more than one TM per > machine given that you assign only 4GB of memory out of 128GB to each TM. > - What is the maximum parallelism of you program? > - How many processing slots do you configure for each TM? > > In general, pipelined shuffles with a high parallelism require a lot of > memory. > If you configure batch instead of pipelined transfer, the memory > requirement goes down > (ExecutionConfig.setExecutionMode(ExecutionMode.BATCH)). > > Eventually, we want to merge the network buffer and the managed memory > pools. So the "taskmanager.network.numberOfBuffers" configuration whill > hopefully disappear at some point in the future. > > Best, Fabian > > 2016-02-19 9:34 GMT+01:00 Welly Tambunan <if05...@gmail.com>: > >> Hi All, >> >> We are trying to running our job in cluster that has this information >> >> 1. # of machine: 16 >> 2. memory : 128 gb >> 3. # of core : 48 >> >> However when we try to run we have an exception. >> >> "insufficient number of network buffers. 48 required but only 10 >> available. the total number of network buffers is currently set to 2048" >> >> After looking at the documentation we set configuration based on docs >> >> taskmanager.network.numberOfBuffers: # core ^ 2 * # machine * 4 >> >> However we face another error from JVM >> >> java.io.IOException: Cannot allocate network buffer pool: Could not >> allocate enough memory segments for NetworkBufferPool (required (Mb): 2304, >> allocated (Mb): 698, missing (Mb): 1606). Cause: Java heap space >> >> We fiddle the taskmanager.heap.mb: 4096 >> >> Finally the cluster is running. >> >> However i'm still not sure about the configuration and fiddling in task >> manager heap really fine tune. So my question is >> >> >> 1. Am i doing it right for numberOfBuffers ? >> 2. How much should we allocate on taskmanager.heap.mb given the >> information >> 3. Any suggestion which configuration we need to set to make it >> optimal for the cluster ? >> 4. Is there any chance that this will get automatically resolve by >> memory/network buffer manager ? >> >> Thanks a lot for the help >> >> Cheers >> >> -- >> Welly Tambunan >> Triplelands >> >> http://weltam.wordpress.com >> http://www.triplelands.com <http://www.triplelands.com/blog/> >> > > -- Welly Tambunan Triplelands http://weltam.wordpress.com http://www.triplelands.com <http://www.triplelands.com/blog/>