Hello Federico, Yes you are correct, the type of the storage engine is indeed configured via the flink-yaml.conf, the state engines you have mentioned are the ones that are actively mantinted and configured by the community for a long time.
I'm not sure however what are the requirements for implementing your own state backend, I'd suggest posting a separate question to the mailing list asking how to develop your own state backend for Flink independently of StateFun, as other folks might provide additional context :-) All the best! Igal. On Fri, Mar 11, 2022 at 12:07 AM Federico D'Ambrosio <fedex...@gmail.com> wrote: > Hi Igal, > > thank you so much for your response. > > As for [2], I was mainly interested in how the state is stored physically. > Looking at the deployment files, I see the following file > > https://github.com/apache/flink-statefun-playground/blob/main/deployments/k8s/04-statefun/01-statefun-runtime.yaml > > > where the state seems to be defined by the keys in the flink-conf.yaml: > > state.backend: rocksdb > state.backend.rocksdb.timer-service.factory: ROCKSDB > > As far as I can tell from the docs, the built-in backends are FS, HashMap > and RocksDB, but I can technically implement my own backend by implementing > this > abstract class > <https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/state/AbstractStateBackend.java> > (and a related factory), is that correct? > > Thank you again, > Federico > > Il giorno gio 24 feb 2022 alle ore 15:11 Igal Shilman <i...@apache.org> > ha scritto: > >> Hello, >> >> For (1) I welcome you to visit our documentions, and many talks online to >> understand more about the motivation and the value of StateFun. I can say >> in a nutshell that StateFun provides few building blocks that makes >> building distributed stateful applications easier. >> >> For (2) checkout our playground repository to see how storage is >> configured. It is completely defined by the SDK and is not configured by >> Flink cluster configuration. >> >> I think that the use case you are describing is a good fit for StateFun. >> If you check out the latest Flink Forward's videos there were few that >> described how to use >> StateFun for exactly that[3]. >> >> Good luck! >> Igal >> >> [1] https://nightlies.apache.org/flink/flink-statefun-docs-stable/ >> [2] https://github.com/apache/flink-statefun-playground >> [3] https://www.youtube.com/channel/UCY8_lgiZLZErZPF47a2hXMA/videos >> >> On Sun, Feb 20, 2022 at 1:54 PM Federico D'Ambrosio <fedex...@gmail.com> >> wrote: >> >>> Hello everyone, >>> >>> It's been quite a while since I wrote to the Flink ML, because in my >>> current job never actually arose the need for a stateful stream processing >>> system, until now. >>> >>> Since the last version I actually tried was Flink 1.9, well before >>> Stateful Functions, I had a few questions about some of the latest features. >>> >>> 1. What are the use cases for which Flink Statefuns were thought of? As >>> far as I understand from the documentation, they are basically processors >>> that can be separated from a "main" Flink streaming job (and can be >>> integrated with), but I fail to grasp how they should differ from a rest >>> endpoint implemented using any other framework. >>> 2. How is the storage for these functions configured? I see that the >>> storage for the state is accessed via a Context object, so I think it is >>> configured by a Flink cluster configuration? >>> >>> I would like, then, to elaborate on my use case: we have some 20 CDC >>> topics (1 topic per table) on Kafka. Upon the data streamed on these >>> topics, we need to compute many features to be used by a ML model. Many of >>> these features need to be computed by joining multiple topics and/or need >>> the whole history of the field. So, I was wondering if Stateful Functions >>> could be a good approach to this problem, where a feature could be >>> "packaged" in a single stateful function to be "triggered" by the arrival >>> of any new message on the topic configured as its ingress. >>> >>> So, basically, I'm wondering if they could fit the use case, or we're >>> better off with a custom flink job. >>> >>> Thank you for your time, >>> -- >>> Federico D'Ambrosio >>> >> > > -- > Federico D'Ambrosio >