Re: Flink Statefun and Feature computation

Igal Shilman Fri, 11 Mar 2022 00:15:29 -0800

Hello Federico,

Yes you are correct, the type of the storage engine is indeed configured
via the flink-yaml.conf, the state engines you have mentioned are the ones
that are actively mantinted and configured by the community for a long time.


I'm not sure however what are the requirements for implementing your own
state backend, I'd suggest posting a separate question to the mailing list
asking how to develop your own state backend for Flink independently of
StateFun, as other folks might provide additional context :-)

All the best!
Igal.



On Fri, Mar 11, 2022 at 12:07 AM Federico D'Ambrosio <fedex...@gmail.com>
wrote:

> Hi Igal,
>
> thank you so much for your response.
>
> As for [2], I was mainly interested in how the state is stored physically.
> Looking at the deployment files, I see the following file
>
> https://github.com/apache/flink-statefun-playground/blob/main/deployments/k8s/04-statefun/01-statefun-runtime.yaml
>
>
> where the state seems to be defined by the keys in the flink-conf.yaml:
>
> state.backend: rocksdb
> state.backend.rocksdb.timer-service.factory: ROCKSDB
>
> As far as I can tell from the docs, the built-in backends are FS, HashMap
> and RocksDB, but I can technically implement my own backend by implementing 
> this
> abstract class
> <https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/state/AbstractStateBackend.java>
> (and a related factory), is that correct?
>
> Thank you again,
> Federico
>
> Il giorno gio 24 feb 2022 alle ore 15:11 Igal Shilman <i...@apache.org>
> ha scritto:
>
>> Hello,
>>
>> For (1) I welcome you to visit our documentions, and many talks online to
>> understand more about the motivation and the value of StateFun. I can say
>> in a nutshell that StateFun provides few building blocks that makes
>> building distributed stateful applications easier.
>>
>> For (2) checkout our playground repository to see how storage is
>> configured. It is completely defined by the SDK and is not configured by
>> Flink cluster configuration.
>>
>> I think that the use case you are describing is a good fit for StateFun.
>> If you check out the latest Flink Forward's videos there were few that
>> described how to use
>> StateFun for exactly that[3].
>>
>> Good luck!
>> Igal
>>
>> [1] https://nightlies.apache.org/flink/flink-statefun-docs-stable/
>> [2] https://github.com/apache/flink-statefun-playground
>> [3] https://www.youtube.com/channel/UCY8_lgiZLZErZPF47a2hXMA/videos
>>
>> On Sun, Feb 20, 2022 at 1:54 PM Federico D'Ambrosio <fedex...@gmail.com>
>> wrote:
>>
>>> Hello everyone,
>>>
>>> It's been quite a while since I wrote to the Flink ML, because in my
>>> current job never actually arose the need for a stateful stream processing
>>> system, until now.
>>>
>>> Since the last version I actually tried was Flink 1.9, well before
>>> Stateful Functions, I had a few questions about some of the latest features.
>>>
>>> 1. What are the use cases for which Flink Statefuns were thought of? As
>>> far as I understand from the documentation, they are basically processors
>>> that can be separated from a "main" Flink streaming job (and can be
>>> integrated with), but I fail to grasp how they should differ from a rest
>>> endpoint implemented using any other framework.
>>> 2. How is the storage for these functions configured? I see that the
>>> storage for the state is accessed via a Context object, so I think it is
>>> configured by a Flink cluster configuration?
>>>
>>> I would like, then, to elaborate on my use case: we have some 20 CDC
>>> topics (1 topic per table) on Kafka. Upon the data streamed on these
>>> topics, we need to compute many features to be used by a ML model. Many of
>>> these features need to be computed by joining multiple topics and/or need
>>> the whole history of the field. So, I was wondering if Stateful Functions
>>> could be a good approach to this problem, where a feature could be
>>> "packaged" in a single stateful function to be "triggered" by the arrival
>>> of any new message on the topic configured as its ingress.
>>>
>>> So, basically, I'm wondering if they could fit the use case, or we're
>>> better off with a custom flink job.
>>>
>>> Thank you for your time,
>>> --
>>> Federico D'Ambrosio
>>>
>>
>
> --
> Federico D'Ambrosio
>

Re: Flink Statefun and Feature computation

Reply via email to