Re: Native memory allocation (mmap) failed to map 1006567424 bytes

2020-10-29 Thread Ori Popowski
- I will increase the jvm-overhead - I don't have any failovers or restarts until it starts happening - If it happens again even with the changes, I'll post the NMT output On Fri, Oct 30, 2020 at 3:54 AM Xintong Song wrote: > Hi Ori, > > I'm not sure about where the problem comes from. There are

Re: Native memory allocation (mmap) failed to map 1006567424 bytes

2020-10-29 Thread Xintong Song
Hi Ori, I'm not sure about where the problem comes from. There are several things that might worse a try. - Further increasing the `jvm-overhead`. Your `ps` result suggests that the Flink process uses 120+GB, while `process.size` is configured 112GB. So I think 2GB `jvm-overhead` might not be enou

Re: Native memory allocation (mmap) failed to map 1006567424 bytes

2020-10-29 Thread Ori Popowski
Hi Xintong, Unfortunately I cannot upgrade to 1.10.2, because EMR has either 1.10.0 or 1.11.0. About the overhead - turns out I already configured taskmanager.memory.jvm-overhead.max to 2 gb instead of the default 1 gb. Should I increase it further? state.backend.rocksdb.memory.managed is alread

Re: Native memory allocation (mmap) failed to map 1006567424 bytes

2020-10-29 Thread Xintong Song
Hi Ori, RocksDB also uses managed memory. If the memory overuse indeed comes from RocksDB, then increasing managed memory fraction will not help. RocksDB will try to use as many memory as the configured managed memory size. Therefore increasing managed memory fraction also makes RocksDB try to use

Re: Native memory allocation (mmap) failed to map 1006567424 bytes

2020-10-29 Thread Ori Popowski
Hi, PID 20331 is indeed the Flink process, specifically the TaskManager process. - Workload is a streaming workload reading from Kafka and writing to S3 using a custom Sink - RockDB state backend is used with default settings - My external dependencies are: -- logback -- jackson -- flatbuffers --

Re: Native memory allocation (mmap) failed to map 1006567424 bytes

2020-10-29 Thread Xintong Song
Hi Ori, It looks like Flink indeed uses more memory than expected. I assume the first item with PID 20331 is the flink process, right? It would be helpful if you can briefly introduce your workload. - What kind of workload are you running? Streaming or batch? - Do you use RocksDB state backend? -

Re: Native memory allocation (mmap) failed to map 1006567424 bytes

2020-10-28 Thread Ori Popowski
Hi Xintong, See here: # Top memory users ps auxwww --sort -rss | head -10 USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND yarn 20339 35.8 97.0 128600192 126672256 ? Sl Oct15 5975:47 /etc/alternatives/jre/bin/java -Xmx54760833024 -Xms54760833024 -XX:Max root 524

Re: Native memory allocation (mmap) failed to map 1006567424 bytes

2020-10-28 Thread Xintong Song
Hi Ori, The error message suggests that there's not enough physical memory on the machine to satisfy the allocation. This does not necessarily mean a managed memory leak. Managed memory leak is only one of the possibilities. There are other potential reasons, e.g., another process/container on the