Dear Flink Community,
I am using Flink with the DataStream API and operators implemented using
RichedFunctions. I know that Flink provides a set of window-based
operators with time-based semantics and tumbling/sliding windows.
By reading the Flink documentation, I understand that there is the
possibility to change the memory backend utilized for storing the
in-flight state of the operators. For example, using RocksDB for this
purpose to cope with a larger-than-memory state. If I am not wrong, to
transparently change the backend (e.g., from in-memory to RocksDB) we
have to use a proper API to access the state. For example, the Keyed
State API with different abstractions such as ValueState<T>,
ListState<T>, etc... as reported here
<https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/datastream/fault-tolerance/state/>.
My question is related to the utilization of time-based window operators
with the RocksDB backend. Suppose for example very large temporal
windows with a huge number of keys in the stream. I am wondering if
there is a possibility to use the built-in window operators of Flink
(e.g., with an AggregateFunction or a more generic ProcessWindowFunction
as here
<https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/datastream/operators/windows/>)
transparently with RocksDB support as a state back-end, or if I have to
develop the window operator in a raw manner using the Keyed State API
(e.g., ListState, AggregateState) for this purpose by implementing the
underlying window logic manually in the code of RichedFunction of the
operator (e.g., a FlatMap).
Thanks for your support,
--
Gabriele Mencagli