[ https://issues.apache.org/jira/browse/FLINK-7289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16106133#comment-16106133 ]
Vinay edited comment on FLINK-7289 at 7/29/17 4:18 PM: ------------------------------------------------------- Hi Stefan, I have mainly used RocksDB on EMR backed up by SSD's and 122GB memory. Although FLASH_SSD_OPTION is good, it does not provide control over the amount of memory to be used. So I had tuned some parameters with the below configurations : {code:java} DBOptions: (along with the FLASH_SSD_OPTIONS add the following) maxBackgroundCompactions(4) ColumnFamilyOptions: max_buffer_size : 512 MB block_cache_size : 128 MB max_write_buffer_number : 5 minimum_buffer_number_to_merge : 2 cacheIndexAndFilterBlocks : true optimizeFilterForHits: true {code} According to the documentation when {code:java} optimizeFilterForHits: true {code} is set, RocksDB will not build bloom filters on the last level which contains 90% of DB. Thus the memory usage for bloom filters will be 10x less. As RocksDB uses a lot of memory , if we cancel the job in between the memory used is not reclaimed. For Example: assuming that the job is running for 1 hour and the memory used is 50GB , now when we cancel the job from UI the memory is not reclaimed. I have observed this case when I had run the job on YARN. In order to reclaim the memory I had to manually run the following command on each node of EMR: {code:java} sync; echo 3 > /proc/sys/vm/drop_caches sync; echo 2 > /proc/sys/vm/drop_caches sync; echo 1 > /proc/sys/vm/drop_caches {code} was (Author: vinaypatil18): Hi Stefan, I have mainly used RocksDB on EMR backed up by SSD's and 122GB memory. Although FLASH_SSD_OPTION is good, it does not provide control over the amount of memory to be used. So I had tuned some parameters with the below configurations : {code:java} DBOptions: (along with the FLASH_SSD_OPTIONS add the following) maxBackgroundCompactions(4) ColumnFamilyOptions: max_buffer_size : 512 MB block_cache_size : 128 MB max_write_buffer_number : 5 minimum_buffer_number_to_merge : 2 cacheIndexAndFilterBlocks : true optimizeFilterForHits: true {code} According to the documentation when {code:java} optimizeFilterForHits: true {code} is set, RocksDB will not build bloom filters on the last level which contains 90% of DB. Thus the memory usage for bloom filters will be 10x less. As RocksDB uses a lot of memory , if we cancel the job in between the memory used is not reclaimed. For Example: assuming that the job is running for 1 hour and the memory used is 50GB , now when we cancel the job from UI the memory is not reclaimed. I have observed this case when I had run the job on YARN. I order to reclaim the memory I had to manually run the following command on each node of EMR: {code:java} sync; echo 3 > /proc/sys/vm/drop_caches sync; echo 2 > /proc/sys/vm/drop_caches sync; echo 1 > /proc/sys/vm/drop_caches {code} > Memory allocation of RocksDB can be problematic in container environments > ------------------------------------------------------------------------- > > Key: FLINK-7289 > URL: https://issues.apache.org/jira/browse/FLINK-7289 > Project: Flink > Issue Type: Improvement > Components: State Backends, Checkpointing > Affects Versions: 1.2.0, 1.3.0, 1.4.0 > Reporter: Stefan Richter > > Flink's RocksDB based state backend allocates native memory. The amount of > allocated memory by RocksDB is not under the control of Flink or the JVM and > can (theoretically) grow without limits. > In container environments, this can be problematic because the process can > exceed the memory budget of the container, and the process will get killed. > Currently, there is no other option than trusting RocksDB to be well behaved > and to follow its memory configurations. However, limiting RocksDB's memory > usage is not as easy as setting a single limit parameter. The memory limit is > determined by an interplay of several configuration parameters, which is > almost impossible to get right for users. Even worse, multiple RocksDB > instances can run inside the same process and make reasoning about the > configuration also dependent on the Flink job. > Some information about the memory management in RocksDB can be found here: > https://github.com/facebook/rocksdb/wiki/Memory-usage-in-RocksDB > https://github.com/facebook/rocksdb/wiki/RocksDB-Tuning-Guide > We should try to figure out ways to help users in one or more of the > following ways: > - Some way to autotune or calculate the RocksDB configuration. > - Conservative default values. > - Additional documentation. -- This message was sent by Atlassian JIRA (v6.4.14#64029)