We have a flink job containing almost 20 process functions (map, flatMap, 
process, filter, etc.) The state dependencies among those process functions are 
very complex:

  *   Shared states are several key-value maps.
  *   Different functions share different states.
  *   Functions may query and modify states.
  *   We want those states to be single source of truth.
We have tried two ways to solve the problem, both were not good enough:

  *   We tried to redesign stream graph -- using CoFlatMapFunction to store the 
states and union different types of messages. That made our stream much more 
complex and hard to ensure the correctness.
  *   We tried to use outside storage, like redis, zookeeper (poor performance) 
and mysql. In this case, our code was much much more simpler and it did run 
correctly. However, when we gave functions a high parallelism, we had bad 
performance issue -- Since each parallel sub task not shared connections, it 
wasted many network and hardware resources.
Are there other better ways to deal with shared state problem in flink? If not, 
which way I mentioned is better? Could any flink master give us some best 
practices.
Thanks.
Edit: We migrated from akka actor system and akka stream. In akka, sharing a 
singleton state is not that hard -- just add a state actor.

StackOverflow link: 
https://stackoverflow.com/questions/47388411/single-source-of-truth-for-states-among-multiple-process-functions

Reply via email to