Based on my (streaming mode) experiments I see that its not simply on heap and
off heap memory. There are actually 3 divisions of memory:
1- On heap (-Xmx) . 2- Off heap - (DirectByteBuffers allocations ...
NetwkBuff, Netty, JVMMetadata, optionally the TM managed mem)3- and The
container "cut off" part (0.3 in my example)
The cut off ratio controls what is left over for the 1 & 2. Thereafter the
other off-heap reservations dictate what is left over for on-heap.
Obviously RocksDB mem is not on-heap. My intuition is that RocksDB mem might
fall into the "cut off" section + off heap. However that depends on whether or
not Flink+Netty fully pre-allocate whatever is reserved for the off-heap memory
before RocksDB spins up. If they do preallocate, then RocksDB native
allocations will fall into 3 only.
If cut off is not used by anything.. I cant think of good reason for having
such a high reservation (default 25%) in every container being totally unused.
I don't see any easy way to a- Confirm where RocksDB mem (i,e in 2 or 3 or in
2&3) b- Rough estimate for the amt of mem RDB needs for a certain MB or GB of
data that I need to host in it c- determine how to tune 1 &2 & 3 to ensure the
RDB gets enough memory without randomly crashing the job
Unfortunately coverage of this mem division is only briefly given in some of
the unofficial presentations on youtube ... but appears to be inaccurate.
-roshan
On Friday, February 22, 2019, 10:44:02 PM PST, Yun Tang <[email protected]>
wrote:
Hi Roshan
>From our experience, RocksDB memory allocation actually cannot be controlled
>well from Flink's side.
The option containerized.heap-cutoff-ratio is mainly used to calculate JVM heap
size, and the left part is treated as off-heap size. In perfect situation,
RocksDB's memory should deploy in the off-heap side. However, Flink just start
the RocksDB process and left the memory allocation to RocksDB itself. If YARN
is enabled to check total memory usage, and the total memory usage exceed the
limit due to RocksDB memory increased, container would be killed then.
To control RocksDB memory, I recommend you to configure an acceptable write
buffer and block cache size, set 'cacheIndexAndFilterBlocks',
'optimizeFilterForHits' and 'pinL0FilterAndIndexBlocksInCache' as true (the
first one is for memory control and the latter two are for performance when we
cache index & filter, refer to [1] for more information.) Last but not least,
ensure not to use many states within one operator, that will cause RocksDB use
many column families and each family consumes the specific write buffer(s).
[1]
https://github.com/facebook/rocksdb/wiki/Memory-usage-in-RocksDB#indexes-and-filter-blocks
[https://avatars0.githubusercontent.com/u/69631?s=400&v=4]<https://github.com/facebook/rocksdb/wiki/Memory-usage-in-RocksDB#indexes-and-filter-blocks>
Memory usage in RocksDB · facebook/rocksdb Wiki ·
GitHub<https://github.com/facebook/rocksdb/wiki/Memory-usage-in-RocksDB#indexes-and-filter-blocks>
A library that provides an embeddable, persistent key-value store for fast
storage. - facebook/rocksdb
github.com
Best
Yun Tang
________________________________
From: Roshan Naik <[email protected]>
Sent: Saturday, February 23, 2019 10:09
To: [email protected]
Subject: Where does RocksDB's mem allocation occur
For yarn deployments, Lets say you have lets say the container size = 10 GB
containerized.heap-cutoff-ratio = 0.3 ( = 3GB)
That means 7GB is available for Flinks various subsystems which include,= jvm
heap, and all the DirectByteBufferAllocatins (netty + netw buff + .. ) and Java
metadata.
I am wondering if RocksDB memory allocations (which is C++ native memory
allocations) are drawn from the 3GB "cutoff" space or it will come from
whatever is left from the remaining 7GB (i.e left after reserving for above
mentioned pieces).
-roshan