Re: Where does RocksDB's mem allocation occur

Roshan Naik Sat, 23 Feb 2019 01:49:50 -0800

 Based on my (streaming mode) experiments I see that its not simply on heap and 
off heap memory. There are actually 3 divisions of memory:


1- On heap  (-Xmx) . 2- Off heap - (DirectByteBuffers allocations ... 
NetwkBuff, Netty, JVMMetadata, optionally the TM managed mem)3- and The 
container "cut off" part (0.3 in my example)
The cut off ratio controls what is left over for the  1 & 2. Thereafter the 
other off-heap reservations dictate what is left over for on-heap.
Obviously RocksDB mem is not on-heap. My intuition is that RocksDB mem might 
fall into the "cut off" section + off heap. However that depends on whether or 
not Flink+Netty fully pre-allocate whatever is reserved for the off-heap memory 
before RocksDB spins up.  If they do preallocate, then RocksDB native 
allocations will fall into 3 only.
If cut off is not used by anything.. I cant think of good reason for having 
such a high reservation (default 25%) in every container being totally unused.
I don't see any easy way to   a- Confirm where RocksDB mem (i,e in 2 or 3 or in 
2&3)   b- Rough estimate for the amt of mem RDB needs for a certain MB or GB of 
data that I need to host in it  c- determine how to tune 1 &2 & 3 to ensure the 
RDB gets enough memory without randomly crashing the job
Unfortunately coverage of this mem division is only briefly given in some of 
the unofficial presentations on youtube ... but  appears to be inaccurate.
-roshan
    On Friday, February 22, 2019, 10:44:02 PM PST, Yun Tang <[email protected]> 
wrote:  
 
 Hi Roshan

>From our experience, RocksDB memory allocation actually cannot be controlled 
>well from Flink's side.
The option containerized.heap-cutoff-ratio is mainly used to calculate JVM heap 
size, and the left part is treated as off-heap size. In perfect situation, 
RocksDB's memory should deploy in the off-heap side. However, Flink just start 
the RocksDB process and left the memory allocation to RocksDB itself. If YARN 
is enabled to check total memory usage, and the total memory usage exceed the 
limit due to RocksDB memory increased, container would be killed then.

To control RocksDB memory, I recommend you to configure an acceptable write 
buffer and block cache size, set 'cacheIndexAndFilterBlocks', 
'optimizeFilterForHits' and 'pinL0FilterAndIndexBlocksInCache' as true (the 
first one is for memory control and the latter two are for performance when we 
cache index & filter, refer to [1] for more information.) Last but not least, 
ensure not to use many states within one operator, that will cause RocksDB use 
many column families and each family consumes the specific write buffer(s).

[1] 
https://github.com/facebook/rocksdb/wiki/Memory-usage-in-RocksDB#indexes-and-filter-blocks
[https://avatars0.githubusercontent.com/u/69631?s=400&v=4]<https://github.com/facebook/rocksdb/wiki/Memory-usage-in-RocksDB#indexes-and-filter-blocks>

Memory usage in RocksDB · facebook/rocksdb Wiki · 
GitHub<https://github.com/facebook/rocksdb/wiki/Memory-usage-in-RocksDB#indexes-and-filter-blocks>
A library that provides an embeddable, persistent key-value store for fast 
storage. - facebook/rocksdb
github.com


Best
Yun Tang

________________________________
From: Roshan Naik <[email protected]>
Sent: Saturday, February 23, 2019 10:09
To: [email protected]
Subject: Where does RocksDB's mem allocation occur

For yarn deployments,  Lets say you have  lets say the container size = 10 GB
 containerized.heap-cutoff-ratio = 0.3  ( = 3GB)
That means 7GB is available for Flinks various subsystems which include,= jvm 
heap, and all the DirectByteBufferAllocatins (netty + netw buff + .. ) and Java 
metadata.
I am wondering if RocksDB memory allocations (which is C++ native memory 
allocations) are drawn from the 3GB "cutoff" space or it will come from 
whatever is left from the remaining 7GB (i.e left after reserving for above 
mentioned pieces).
-roshan

Re: Where does RocksDB's mem allocation occur

Reply via email to