Flink Statefun and Feature computation

Federico D'Ambrosio Sun, 20 Feb 2022 04:54:39 -0800

Hello everyone,

It's been quite a while since I wrote to the Flink ML, because in my
current job never actually arose the need for a stateful stream processing
system, until now.


Since the last version I actually tried was Flink 1.9, well before Stateful
Functions, I had a few questions about some of the latest features.

1. What are the use cases for which Flink Statefuns were thought of? As far
as I understand from the documentation, they are basically processors that
can be separated from a "main" Flink streaming job (and can be integrated
with), but I fail to grasp how they should differ from a rest endpoint
implemented using any other framework.
2. How is the storage for these functions configured? I see that the
storage for the state is accessed via a Context object, so I think it is
configured by a Flink cluster configuration?

I would like, then, to elaborate on my use case: we have some 20 CDC topics
(1 topic per table) on Kafka. Upon the data streamed on these topics, we
need to compute many features to be used by a ML model. Many of these
features need to be computed by joining multiple topics and/or need the
whole history of the field. So, I was wondering if Stateful Functions could
be a good approach to this problem, where a feature could be "packaged" in
a single stateful function to be "triggered" by the arrival of any new
message on the topic configured as its ingress.

So, basically, I'm wondering if they could fit the use case, or we're
better off with a custom flink job.

Thank you for your time,
-- 
Federico D'Ambrosio

Flink Statefun and Feature computation

Reply via email to