Greetings, Flink developers. I would like to open up a discussion of a proposal [1] to add a new kind of state to Flink.
The goal here is to optimize a fairly common pattern, which is using MapState<Long, List<Event>> to store lists of events associated with timestamps. This pattern is used internally in quite a few operators that implement sorting and joins, and it also shows up in user code, for example, when implementing custom windowing in a KeyedProcessFunction. Nico Kruber, Seth Wiesman, and I have implemented a POC that achieves a more than 2x improvement in throughput when performing these operations on RocksDB by better leveraging the capabilities of the RocksDB state backend. See FLIP-220 [1] for details. Best, David [1] https://cwiki.apache.org/confluence/x/Xo_FD