Hey folks,

After digging into this a bit it does seem like Broadcast State would fit the 
bill for this scenario and keeping the downstream operators up-to-date as 
messages arrived in my Kafka topic.

My question is - is there a pattern for pre-populating the state initially? In 
my case, I need to have loaded all of my “lookup” topic into state before 
processing any records in the other stream.

My thought initially is to do something like this, if it’s possible:

- Create a KafkaConsumer on startup to read the lookup topic in its entirety 
into some collection like a hashmap (prior to executing the Flink pipeline to 
ensure synchronicity)
- Use this to initialize the state of my broadcast stream (if possible)
- At this point that stream would be broadcasting any new records coming in, so 
I “should” stay up to date at that point.

Is this an oversimplification or is there an obviously better / well known 
approach to handling this?

Thanks,

Rion

> On May 14, 2021, at 9:51 AM, Rion Williams <rionmons...@gmail.com> wrote:
> 
> 
> Hi all,
> 
> I've encountered a challenge within a Flink job that I'm currently working 
> on. The gist of it is that I have a job that listens to a series of events 
> from a Kafka topic and eventually sinks those down into Postgres via the 
> JDBCSink.
> 
> A requirement recently came up for the need to filter these events based on 
> some configurations that are currently being stored within another Kafka 
> topic. I'm wondering what the best approach might be to handle this type of 
> problem.
> 
> My initial naive approach was:
> When Flink starts up, use a regular Kafka Consumer and read all of the 
> configuration data from that topic in its entirety.
> Store the messages from that topic in some type of thread-safe collection 
> statically accessible by the operators downstream.
> Expose the thread-safe collection within the operators to actually perform 
> the filtering.
> This doesn't seem right though. I was reading about BroadcastState which 
> seems like it might fit the bill (e.g. keep those mappings in Broadcast state 
> so that all of the downstream operations would have access to them, which I'd 
> imagine would handle keeping things up to date). 
> 
> Does Flink have a good pattern / construct to handle this? Basically, I have 
> a series of mappings that I want to keep relatively up to date in a Kafka 
> topic, and I'm consuming from another Kafka topic that will need those 
> mappings to filter against.
> 
> I'd be happy to share some of the approaches I currently have or elaborate a 
> bit more if that isn't super clear.
> 
> Thanks much,
> 
> Rion
> 

Reply via email to