Re: Flink Statefun and Feature computation

Federico D'Ambrosio Thu, 10 Mar 2022 15:07:45 -0800

Hi Igal,

thank you so much for your response.


As for [2], I was mainly interested in how the state is stored physically.
Looking at the deployment files, I see the following file

https://github.com/apache/flink-statefun-playground/blob/main/deployments/k8s/04-statefun/01-statefun-runtime.yaml


where the state seems to be defined by the keys in the flink-conf.yaml:

state.backend: rocksdb
state.backend.rocksdb.timer-service.factory: ROCKSDB

As far as I can tell from the docs, the built-in backends are FS, HashMap
and RocksDB, but I can technically implement my own backend by
implementing this
abstract class
<https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/state/AbstractStateBackend.java>
(and a related factory), is that correct?

Thank you again,
Federico

Il giorno gio 24 feb 2022 alle ore 15:11 Igal Shilman <i...@apache.org> ha
scritto:

> Hello,
>
> For (1) I welcome you to visit our documentions, and many talks online to
> understand more about the motivation and the value of StateFun. I can say
> in a nutshell that StateFun provides few building blocks that makes
> building distributed stateful applications easier.
>
> For (2) checkout our playground repository to see how storage is
> configured. It is completely defined by the SDK and is not configured by
> Flink cluster configuration.
>
> I think that the use case you are describing is a good fit for StateFun.
> If you check out the latest Flink Forward's videos there were few that
> described how to use
> StateFun for exactly that[3].
>
> Good luck!
> Igal
>
> [1] https://nightlies.apache.org/flink/flink-statefun-docs-stable/
> [2] https://github.com/apache/flink-statefun-playground
> [3] https://www.youtube.com/channel/UCY8_lgiZLZErZPF47a2hXMA/videos
>
> On Sun, Feb 20, 2022 at 1:54 PM Federico D'Ambrosio <fedex...@gmail.com>
> wrote:
>
>> Hello everyone,
>>
>> It's been quite a while since I wrote to the Flink ML, because in my
>> current job never actually arose the need for a stateful stream processing
>> system, until now.
>>
>> Since the last version I actually tried was Flink 1.9, well before
>> Stateful Functions, I had a few questions about some of the latest features.
>>
>> 1. What are the use cases for which Flink Statefuns were thought of? As
>> far as I understand from the documentation, they are basically processors
>> that can be separated from a "main" Flink streaming job (and can be
>> integrated with), but I fail to grasp how they should differ from a rest
>> endpoint implemented using any other framework.
>> 2. How is the storage for these functions configured? I see that the
>> storage for the state is accessed via a Context object, so I think it is
>> configured by a Flink cluster configuration?
>>
>> I would like, then, to elaborate on my use case: we have some 20 CDC
>> topics (1 topic per table) on Kafka. Upon the data streamed on these
>> topics, we need to compute many features to be used by a ML model. Many of
>> these features need to be computed by joining multiple topics and/or need
>> the whole history of the field. So, I was wondering if Stateful Functions
>> could be a good approach to this problem, where a feature could be
>> "packaged" in a single stateful function to be "triggered" by the arrival
>> of any new message on the topic configured as its ingress.
>>
>> So, basically, I'm wondering if they could fit the use case, or we're
>> better off with a custom flink job.
>>
>> Thank you for your time,
>> --
>> Federico D'Ambrosio
>>
>

-- 
Federico D'Ambrosio

Re: Flink Statefun and Feature computation

Reply via email to