Hi Jai,

On Tue, Feb 22, 2022 at 9:19 PM Jai Patel <jai.pa...@cloudkitchens.com>
wrote:

> It seems like the errors are similar to those discussed here:
> - https://issues.apache.org/jira/browse/FLINK-14316
> - https://cdmana.com/2020/11/20201116104527255b.html
>

I couldn't find any other existing issue apart from the one you already
linked. Just to be sure: Which Flink version are you using? Is it one where
the reported issue is fixed?

As for the issue itself, it looks like the connection between JobManager
and TaskManager was lost, though I can't tell why. Do you have full logs
from JobManager and TaskManager surrounding such an incident?


> When looking at the memory structure it looks like all memory is below
> 100% except for managed memory.  We have 9.1GB of managed memory for each
> of our 6 task managers and I estimate that our total Flink State is 600GB.
> Is it okay for run with that little memory for that much State?
>

Are you using RocksDB or HashMap state backend [1]? I assume it's RocksDB,
since with HashMapStateBackend, state size is limited by memory size (and
you are way above that). Did you check out the memory configuration
recommendations in the docs [2, 3]?
In principle (assuming RocksDB is used), I don't think the amount of memory
should be an issue (at least it shouldn't cause crashes). The logs would
help to understand what's happening.

Best,
Nico

[1]
https://nightlies.apache.org/flink/flink-docs-master/docs/ops/state/state_backends/
[2]
https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/memory/mem_tuning/#configure-memory-for-state-backends
[3]
https://nightlies.apache.org/flink/flink-docs-master/docs/ops/state/large_state_tuning/#tuning-rocksdb-memory

Reply via email to