gt;>>>>>>>>> >
>>>>>>>>>>>
>>>>>>>>>>>- Probably the most straightforward way is to try increasing
>>>>>>>>>>>the timeout to see if that helps. You can leverage the
>
ut, etc.). Maybe
>>>>>>>>>> the easiest
>>>>>>>>>>way is to share the beginning part of your JM/TM logs, including
>>>>>>>>>> the JVM
>>>>>>>>>>parameters and all the loa
>>>>>> to see the
>>>>>>>>>most recent metrics due to the process not responding to the metric
>>>>>>>>>querying services.
>>>>>>>>>- You may also look into the status
n).
>>>>>>> I
>>>>>>> see it only now that I run with G1GC, but with the previous GC it wasn't
>>>>>>> the case.
>>>>>>>
>>>>>>> Does anyone know what can cause high GC time and how to mitigate
>>>>
>>>>>>- Probably the most straightforward way is to try increasing the
>>>>>>>timeout to see if that helps. You can leverage the configuration
>>>>>>> option
>>>>>>>`heartbeat.timeout`[1]. The default is 50s
t;>- It might be helpful to share your configuration setups (e.g.,
>>>>>>the TM resources, JVM parameters, timeout, etc.). Maybe the easiest
>>>>>> way is
>>>>>>to share the beginning part of your JM/TM logs, including the JVM
>&
ing
>>>>services.
>>>>- You may also look into the status of the JM process. If JM is
>>>>under significant GC pressure, it could also happen that the heartbeat
>>>> message from TM is not timely handled before the timeout check.
>>>>
e timeout check.
>>>- Is there any metrics monitoring the network condition between the
>>>JM and timeouted TM? Possibly any jitters?
>>>
>>>
>>> Thank you~
>>>
>>> Xintong Song
>>>
>>>
>>> [1]
>&
y jitters?
>>
>>
>> Thank you~
>>
>> Xintong Song
>>
>>
>> [1]
>> https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/config.html#heartbeat-timeout
>>
>> On Thu, Jun 25, 2020 at 11:15 PM Ori Popowski wrote:
>>
ml#heartbeat-timeout
>
> On Thu, Jun 25, 2020 at 11:15 PM Ori Popowski wrote:
>
>> Hello,
>>
>> I'm running Flink 1.10 on EMR and reading from Kafka with 189 partitions
>> and I have parallelism of 189.
>>
>> Currently running with RocksDB, with chec
#x27;m getting sporadic "Heartbeat of TaskManager timed out" errors with no
> apparent reason.
>
> I check the container that gets the timeout for GC pauses, heap memory,
> direct memory, mapped memory, offheap memory, CPU load, network load, total
> out-records, total in-r
Hello,
I'm running Flink 1.10 on EMR and reading from Kafka with 189 partitions
and I have parallelism of 189.
Currently running with RocksDB, with checkpointing disabled. My state size
is appx. 500gb.
I'm getting sporadic "Heartbeat of TaskManager timed out" errors with no
12 matches
Mail list logo