- I will increase the jvm-overhead
- I don't have any failovers or restarts until it starts happening
- If it happens again even with the changes, I'll post the NMT output
On Fri, Oct 30, 2020 at 3:54 AM Xintong Song wrote:
> Hi Ori,
>
> I'm not sure about where the problem comes from. There are
Hi Ori,
I'm not sure about where the problem comes from. There are several things
that might worse a try.
- Further increasing the `jvm-overhead`. Your `ps` result suggests that
the Flink process uses 120+GB, while `process.size` is configured 112GB. So
I think 2GB `jvm-overhead` might not be enou
Hi Xintong,
Unfortunately I cannot upgrade to 1.10.2, because EMR has either 1.10.0 or
1.11.0.
About the overhead - turns out I already configured
taskmanager.memory.jvm-overhead.max to 2 gb instead of the default 1 gb.
Should I increase it further?
state.backend.rocksdb.memory.managed is alread
Hi Ori,
RocksDB also uses managed memory. If the memory overuse indeed comes from
RocksDB, then increasing managed memory fraction will not help. RocksDB
will try to use as many memory as the configured managed memory size.
Therefore increasing managed memory fraction also makes RocksDB try to use
Hi,
PID 20331 is indeed the Flink process, specifically the TaskManager process.
- Workload is a streaming workload reading from Kafka and writing to S3
using a custom Sink
- RockDB state backend is used with default settings
- My external dependencies are:
-- logback
-- jackson
-- flatbuffers
--
Hi Ori,
It looks like Flink indeed uses more memory than expected. I assume the
first item with PID 20331 is the flink process, right?
It would be helpful if you can briefly introduce your workload.
- What kind of workload are you running? Streaming or batch?
- Do you use RocksDB state backend?
-
Hi Xintong,
See here:
# Top memory users
ps auxwww --sort -rss | head -10
USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND
yarn 20339 35.8 97.0 128600192 126672256 ? Sl Oct15 5975:47
/etc/alternatives/jre/bin/java -Xmx54760833024 -Xms54760833024 -XX:Max
root 524
Hi Ori,
The error message suggests that there's not enough physical memory on the
machine to satisfy the allocation. This does not necessarily mean a managed
memory leak. Managed memory leak is only one of the possibilities. There
are other potential reasons, e.g., another process/container on the