Hey All, The last couple of days I have been playing around with the idea of building a streaming key-value store abstraction using stateful streaming operators that can be used within Flink Streaming programs seamlessly.
Operations executed on this KV store would be fault tolerant as it integrates with the checkpointing mechanism, and if we add timestamps to each put/get/... operation we can use the watermarks to create fully deterministic results. This functionality is very useful for many applications, and is very hard to implement properly with some dedicates kv store. The KVStore abstraction could look as follows: KVStore<K,V> store = new KVStore<>; Operations: store.put(DataStream<Tuple2<K,V>>) store.get(DataStream<K>) -> DataStream<KV<K,V>> store.remove(DataStream<K>) -> DataStream<KV<K,V>> store.multiGet(DataStream<K[]>) -> DataStream<KV<K,V>[]> store.getWithKeySelector(DataStream<X>, KeySelector<X,K>) -> DataStream<KV<X,V>[]> For the resulting streams I used a special KV abstraction which let's us return null values. The implementation uses a simple streaming operator for executing most of the operations (for multi get there is an additional merge operator) with either local or partitioned states for storing the kev-value pairs (my current prototype uses local states). And it can either execute operations eagerly (which would not provide deterministic results), or buffer up operations and execute them in order upon watermarks. As for use cases you can probably come up with many I will save that for now :D I have a prototype implementation here that can execute the operations described above (does not handle watermarks and time yet): https://github.com/gyfora/flink/tree/KVStore And also an example job: https://github.com/gyfora/flink/blob/KVStore/flink-staging/flink-streaming/flink-streaming-core/src/test/java/org/apache/flink/streaming/api/KVStreamExample.java What do you think? If you like it I will work on writing tests and it still needs a lot of tweaking and refactoring. This might be something we want to include with the standard streaming libraries at one point. Cheers, Gyula