Hello, For (1) I welcome you to visit our documentions, and many talks online to understand more about the motivation and the value of StateFun. I can say in a nutshell that StateFun provides few building blocks that makes building distributed stateful applications easier.
For (2) checkout our playground repository to see how storage is configured. It is completely defined by the SDK and is not configured by Flink cluster configuration. I think that the use case you are describing is a good fit for StateFun. If you check out the latest Flink Forward's videos there were few that described how to use StateFun for exactly that[3]. Good luck! Igal [1] https://nightlies.apache.org/flink/flink-statefun-docs-stable/ [2] https://github.com/apache/flink-statefun-playground [3] https://www.youtube.com/channel/UCY8_lgiZLZErZPF47a2hXMA/videos On Sun, Feb 20, 2022 at 1:54 PM Federico D'Ambrosio <fedex...@gmail.com> wrote: > Hello everyone, > > It's been quite a while since I wrote to the Flink ML, because in my > current job never actually arose the need for a stateful stream processing > system, until now. > > Since the last version I actually tried was Flink 1.9, well before > Stateful Functions, I had a few questions about some of the latest features. > > 1. What are the use cases for which Flink Statefuns were thought of? As > far as I understand from the documentation, they are basically processors > that can be separated from a "main" Flink streaming job (and can be > integrated with), but I fail to grasp how they should differ from a rest > endpoint implemented using any other framework. > 2. How is the storage for these functions configured? I see that the > storage for the state is accessed via a Context object, so I think it is > configured by a Flink cluster configuration? > > I would like, then, to elaborate on my use case: we have some 20 CDC > topics (1 topic per table) on Kafka. Upon the data streamed on these > topics, we need to compute many features to be used by a ML model. Many of > these features need to be computed by joining multiple topics and/or need > the whole history of the field. So, I was wondering if Stateful Functions > could be a good approach to this problem, where a feature could be > "packaged" in a single stateful function to be "triggered" by the arrival > of any new message on the topic configured as its ingress. > > So, basically, I'm wondering if they could fit the use case, or we're > better off with a custom flink job. > > Thank you for your time, > -- > Federico D'Ambrosio >