>
> Assuming no -Xmx is set, the doc above says 1/4 of physical memory i.e
> 29GB will be used.
>

This is true.

So, if I can set env.java.opts: "-Xmx102g" in flink-conf.yaml, I am
> assuming the heap max of 102Gb will be used in the N/w mem calculation.
> Is that the right way to set env.java.opts ??
>

I cannot be sure. I just checked, and it seems even for Mesos the "-Xmx"
should be set. So technically, Flink should have always set the "-Xmx". If
you are using a custom shell script for launching task manager processes,
then I cannot tell whether "env.java.opts" works for you.

Thank you~

Xintong Song



On Fri, Jun 12, 2020 at 5:33 PM Vijay Balakrishnan <bvija...@gmail.com>
wrote:

> Hi Xintong,
> Just to be clear. I haven't set any -Xmx -i will check our scripts again.
> Assuming no -Xmx is set, the doc above says 1/4 of physical memory i.e
> 29GB will be used.
>
> So, if I can set env.java.opts: "-Xmx102g" in flink-conf.yaml, I am
> assuming the heap max of 102Gb will be used in the N/w mem calculation.
> Is that the right way to set env.java.opts ??
> TIA,
> Vijay
>
> On Fri, Jun 12, 2020 at 1:49 AM Xintong Song <tonysong...@gmail.com>
> wrote:
>
>> Flink should have calculated the heap size and set the -Xms, according to
>> the equations I mentioned. So if you haven't set an customized -Xmx that
>> overwrites this, it should not use the default 1.4 of physical memory.
>>>
>>>
>>>    - Standalone: jvmHeap = total * (1 - networkFraction) = 102 GB * (1
>>>    - 0.48) = 53 GB
>>>    - On Yarn: jvmHeap = (total - Max(cutoff-min, total * cutoff-ratio))
>>>    * (1 - networkFraction) = (102GB - Max(600MB, 102GB * 0.25)) * (1 - 
>>> 0.48) =
>>>    40.6GB
>>>
>>>
>> Are you running Flink on Mesos? I think Flink has not automatically set
>> -Xmx on Mesos.
>>
>>
>> BTW, from your screenshot the physical memory is 123GB, so 1/4 of that is
>> much closer to 29GB if we consider there are some rounding errors and
>> accuracy loss.
>>
>>
>> Thank you~
>>
>> Xintong Song
>>
>>
>>
>> On Fri, Jun 12, 2020 at 4:33 PM Vijay Balakrishnan <bvija...@gmail.com>
>> wrote:
>>
>>> Thx, Xintong for a great answer. Much appreciated.
>>>
>>> https://ci.apache.org/projects/flink/flink-docs-release-1.9/ops/mem_setup.html#jvm-heap
>>>
>>>
>>> Max heap: if -Xmx is set then it is its value else ΒΌ of physical
>>> machine memory estimated by the JVM
>>>
>>> No -Xmx is set.So, 1/4 of 102GB  = 25.5GB but not sure about the 29GB
>>> figure.
>>>
>>> On Thu, Jun 11, 2020 at 9:14 PM Xintong Song <tonysong...@gmail.com>
>>> wrote:
>>>
>>>> Hi Vijay,
>>>>
>>>> The memory configurations in Flink 1.9 and previous versions are indeed
>>>> complicated and confusing. That is why we made significant changes to it in
>>>> Flink 1.10. If possible, I would suggest upgrading to Flink 1.10, or the
>>>> upcoming Flink 1.11 which is very likely to be released in this month.
>>>>
>>>> Regarding your questions,
>>>>
>>>>    - "Physical Memory" displayed on the web ui stands for the total
>>>>    memory on your machine. This information is retrieved from your OS. It 
>>>> is
>>>>    not related to the network memory calculation. It is displayed mainly 
>>>> for
>>>>    historical reasons.
>>>>    - The error message means that you have about 26.8 GB network
>>>>    memory (877118 * 32768 bytes), and your job is trying to use more.
>>>>    - The "total memory" referred in network memory calculation is:
>>>>       - jvm-heap + network, if managed memory is configured on-heap
>>>>       (default)
>>>>          - According to your screenshot, the managed memory
>>>>          on-heap/off-heap configuration is not touched, so this should be 
>>>> your case.
>>>>       - jvm-heap + managed + network, if managed memory is configured
>>>>       off-heap
>>>>    - The network memory size is actually derived reversely. Flink
>>>>    reads the max heap size from JVM (and the managed memory size from
>>>>    configuration if it is configured off-heap), and derives the network 
>>>> memory
>>>>    size with the following equation.
>>>>       - networkMem = Min(networkMax, Max(networkMin, jvmMaxHeap /
>>>>       (1-networkFraction) * networkFraction))
>>>>       - In your case, networkMem = Min(50GB, Max(500MB, 29GB /
>>>>       (1-0.48) * 0.48)) = 26.8GB
>>>>
>>>> One thing I don't understand is, why do you only have 29GB heap size
>>>> when "taskmanager.heap.size" is configured to be "1044221m" (about 102 GB).
>>>> The JVM heap size ("-Xmx" & "-Xms") is calculated as follows. I'll use
>>>> "total" to represent "taskmanager.heap.size" for short. Also omitted the
>>>> calculations when managed memory is configured off-heap.
>>>>
>>>>    - Standalone: jvmHeap = total * (1 - networkFraction) = 102 GB * (1
>>>>    - 0.48) = 53 GB
>>>>    - On Yarn: jvmHeap = (total - Max(cutoff-min, total *
>>>>    cutoff-ratio)) * (1 - networkFraction) = (102GB - Max(600MB, 102GB * 
>>>> 0.25))
>>>>    * (1 - 0.48) = 40.6GB
>>>>
>>>> Have you specified a custom "-Xmx" parameter?
>>>>
>>>> Thank you~
>>>>
>>>> Xintong Song
>>>>
>>>>
>>>>
>>>> On Fri, Jun 12, 2020 at 7:50 AM Vijay Balakrishnan <bvija...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>> Get this error:
>>>>> java.io.IOException: Insufficient number of network buffers: required
>>>>> 2, but only 0 available. The total number of network buffers is currently
>>>>> set to 877118 of 32768 bytes each. You can increase this number by setting
>>>>> the configuration keys 'taskmanager.network.memory.fraction',
>>>>> 'taskmanager.network.memory.min', and 'taskmanager.network.memory.max'.
>>>>> akka.pattern.AskTimeoutException: Ask timed out on
>>>>> [Actor[akka://flink/user/dispatcher#-1420732632]] after [10000 ms]. 
>>>>> Message
>>>>> of type [org.apache.flink.runtime.rpc.messages.LocalFencedMessage]. A
>>>>> typical reason for `AskTimeoutException` is that the recipient actor 
>>>>> didn't
>>>>> send a reply.
>>>>>
>>>>>
>>>>> Followed docs here:
>>>>>
>>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.9/ops/mem_setup.html
>>>>>
>>>>> network = Min(max, Max(min, fraction x total)  //what does Total mean
>>>>> - The max JVM heap is used to derive the total memory for the calculation
>>>>> of network buffers. - can I see it in the Flink Dashboard ??? 117GB here ?
>>>>> = Min(50G, Max(500mb, Max(0.48 * 117G))  ) = MIn(50G, 56.16G)= 50G
>>>>> 877118 of 32768 bytes each comes to 28.75GB. So, why is it failing ?
>>>>> Used this in flink-conf.yaml:
>>>>>     taskmanager.numberOfTaskSlots: 10
>>>>>     rest.server.max-content-length: 314572800
>>>>>     taskmanager.network.memory.fraction: 0.45
>>>>>     taskmanager.network.memory.max: 50gb
>>>>>     taskmanager.network.memory.min: 500mb
>>>>>     akka.ask.timeout: 240s
>>>>>     cluster.evenly-spread-out-slots: true
>>>>>     akka.tcp.timeout: 240s
>>>>> taskmanager.network.request-backoff.initial: 5000
>>>>> taskmanager.network.request-backoff.max: 30000
>>>>> web.timeout:1000000
>>>>> web.refresh-interval:6000
>>>>>
>>>>> Saw some old calc about buffers
>>>>> (slots/Tm * slots/TM) * #TMs * 4
>>>>> =10 * 10 * 47 * 4 = 18,800 buffers.
>>>>>
>>>>> What am I missing in the network buffer calc ??
>>>>>
>>>>> TIA,
>>>>>
>>>>>
>>>>>

Reply via email to