Flink state reconciliation

Александр Сергеенко Thu, 23 Jul 2020 03:09:14 -0700

Hi,

We use so-called "control stream" pattern to deliver settings to the Flink
job using Apache Kafka topics. The settings are in fact an unlimited stream
of events originating from the master DBMS, which acts as a single point of
truth concerning the rules list.


It may seems odd, since Flink guarantees the "exactly once" delivery
semantics, while a service, which provides the rules publishing mechanism
to Kafka is written using Akka Streams and guarantees the "at least once"
semantics while the rule handling inside Flink Job implemented in an
idempotent manner, but: we have to manage some cases when we need to
execute a reconciliation between the current rules stored at the master
DBMS and the existing Flink State.

We've looked at the Flink's tooling and found out that the State Processor
API can possibly solve our problem, so we basically have to implement a
periodical process, which unloads the State to some external file (.csv)
and then execute a comparison between the set and the information given at
the master system.
Basically it looks like the lambda architecture approach while Flink is
supposed to implement the kappa architecture and in that case our
reconciliation problem looks a bit far-fetched.

Are there any best practices or some patterns addressing such scenarios in
Flink?

Great thanks for any possible assistance and ideas.

-----
Alex Sergeenko

Flink state reconciliation

Reply via email to