Hi Zakelly, thanks for the information, that's interesting. Would you say that reading a subset from RocksDB is fast enough to be pretty much negligible, or could it be a bottleneck if the state of each key is "large"? Again assuming the number of distinct partition keys is large.
Regards, Alexis. On Sun, 18 Feb 2024, 05:02 Zakelly Lan, <zakelly....@gmail.com> wrote: > Hi Alexis, > > Flink does need some heap memory to bridge requests to rocksdb and gather > the results. In most cases, the memory is discarded immediately (eventually > collected by GC). In case of timers, flink do cache a limited subset of > key-values in heap to improve performance. > > In general you don't need to consider its heap consumption since it is > minor. > > > Best, > Zakelly > > On Fri, Feb 16, 2024 at 4:43 AM Asimansu Bera <asimansu.b...@gmail.com> > wrote: > >> Hello Alexis, >> >> I don't think data in RocksDB resides in JVM even with function calls. >> >> For more details, check the link below: >> >> https://github.com/facebook/rocksdb/wiki/RocksDB-Overview#3-high-level-architecture >> >> RocksDB has three main components - memtable, sstfile and WAL(not used in >> Flink as Flink uses checkpointing). When TM starts with statebackend as >> RocksDB,TM has its own RocksDB instance and the state is managed as column >> Family by that TM. Any changes of state go into memtable --> sst--> >> persistent store. When read, data goes to the buffers and cache of RocksDB. >> >> In the case of RocksDB as state backend, JVM still holds threads stack as >> for high degree of parallelism, there are many stacks maintaining separate >> thread information. >> >> Hope this helps!! >> >> >> >> >> >> On Thu, Feb 15, 2024 at 11:21 AM Alexis Sarda-Espinosa < >> sarda.espin...@gmail.com> wrote: >> >>> Hi Asimansu >>> >>> The memory RocksDB manages is outside the JVM, yes, but the mentioned >>> subsets must be bridged to the JVM somehow so that the data can be exposed >>> to the functions running inside Flink, no? >>> >>> Regards, >>> Alexis. >>> >>> >>> On Thu, 15 Feb 2024, 14:06 Asimansu Bera, <asimansu.b...@gmail.com> >>> wrote: >>> >>>> Hello Alexis, >>>> >>>> RocksDB resides off-heap and outside of JVM. The small subset of data >>>> ends up on the off-heap in the memory. >>>> >>>> For more details, check the following link: >>>> >>>> https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/deployment/memory/mem_setup_tm/#managed-memory >>>> >>>> I hope this addresses your inquiry. >>>> >>>> >>>> >>>> >>>> On Thu, Feb 15, 2024 at 12:52 AM Alexis Sarda-Espinosa < >>>> sarda.espin...@gmail.com> wrote: >>>> >>>>> Hello, >>>>> >>>>> Most info regarding RocksDB memory for Flink focuses on what's needed >>>>> independently of the JVM (although the Flink process configures its limits >>>>> and so on). I'm wondering if there are additional special considerations >>>>> with regards to the JVM heap in the following scenario. >>>>> >>>>> Assuming a key used to partition a Flink stream and its state has a >>>>> high cardinality, but that the state of each key is small, when Flink >>>>> prepares the state to expose to a user function during a call (with a >>>>> given >>>>> partition key), I guess it loads only the required subset from RocksDB, >>>>> but >>>>> does this small subset end (temporarily) up on the JVM heap? And if it >>>>> does, does it stay "cached" in the JVM for some time or is it immediately >>>>> discarded after the user function completes? >>>>> >>>>> Maybe this isn't even under Flink's control, but I'm curious. >>>>> >>>>> Regards, >>>>> Alexis. >>>>> >>>>