Question about time-based operators with RocksDB backend

Gabriele Mencagli Mon, 04 Mar 2024 03:39:28 -0800

Dear Flink Community,

I am using Flink with the DataStream API and operators implemented usingRichedFunctions. I know that Flink provides a set of window-basedoperators with time-based semantics and tumbling/sliding windows.

By reading the Flink documentation, I understand that there is thepossibility to change the memory backend utilized for storing thein-flight state of the operators. For example, using RocksDB for thispurpose to cope with a larger-than-memory state. If I am not wrong, totransparently change the backend (e.g., from in-memory to RocksDB) wehave to use a proper API to access the state. For example, the KeyedState API with different abstractions such as ValueState<T>,ListState<T>, etc... as reported here<https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/datastream/fault-tolerance/state/>.

My question is related to the utilization of time-based window operatorswith the RocksDB backend. Suppose for example very large temporalwindows with a huge number of keys in the stream. I am wondering ifthere is a possibility to use the built-in window operators of Flink(e.g., with an AggregateFunction or a more generic ProcessWindowFunctionas here<https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/datastream/operators/windows/>)transparently with RocksDB support as a state back-end, or if I have todevelop the window operator in a raw manner using the Keyed State API(e.g., ListState, AggregateState) for this purpose by implementing theunderlying window logic manually in the code of RichedFunction of theoperator (e.g., a FlatMap).


Thanks for your support,

--
Gabriele Mencagli

Question about time-based operators with RocksDB backend

Reply via email to