Hi Yun, That’s very helpful and good to know that the problem/use-case has been thought about. Since my need is probably shorter-term than later, I’ll likely need to explore a workaround.
Do you know of an approach that might not require the use of check pointing and restarting? I was looking into exploring initializeState within my broadcast-side stream to get it current and then simply listening to the Kafka topic as records come in. I’d imagine this would work, but that may be a bit of a naive approach. Thanks! Rion > On May 17, 2021, at 1:36 AM, Yun Gao <yungao...@aliyun.com> wrote: > > > Hi Rion, > > I think FLIP-150[1] should be able to solve this scenario. > > Since FLIP-150 is still under discussion, for now a temporary method come > to me might be > 1. Write a first job to read the kafka and update the broadcast state of some > operator. The job > would keep the source alive after all the data are emit (like sleep forever), > and when all the data > are processed, then stop the job with savepoint. > 2. Use the savepoint to start the original job. For the operator required the > broadcast state, it could > set the same uid and same state name with the corresponding operator in the > first job, so it could > acqure the state content on startup. > > Yun, > Best > > [1] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-150%3A+Introduce+Hybrid+Source > > ------------------Original Mail ------------------ > Sender:Rion Williams <rionmons...@gmail.com> > Send Date:Mon May 17 07:00:03 2021 > Recipients:user <user@flink.apache.org> > Subject:Re: Handling "Global" Updating State >> Hey folks, >> >> After digging into this a bit it does seem like Broadcast State would fit >> the bill for this scenario and keeping the downstream operators up-to-date >> as messages arrived in my Kafka topic. >> >> My question is - is there a pattern for pre-populating the state initially? >> In my case, I need to have loaded all of my “lookup” topic into state before >> processing any records in the other stream. >> >> My thought initially is to do something like this, if it’s possible: >> >> - Create a KafkaConsumer on startup to read the lookup topic in its entirety >> into some collection like a hashmap (prior to executing the Flink pipeline >> to ensure synchronicity) >> - Use this to initialize the state of my broadcast stream (if possible) >> - At this point that stream would be broadcasting any new records coming in, >> so I “should” stay up to date at that point. >> >> Is this an oversimplification or is there an obviously better / well known >> approach to handling this? >> >> Thanks, >> >> Rion >> >> On May 14, 2021, at 9:51 AM, Rion Williams <rionmons...@gmail.com> wrote: >> >> >> Hi all, >> >> I've encountered a challenge within a Flink job that I'm currently working >> on. The gist of it is that I have a job that listens to a series of events >> from a Kafka topic and eventually sinks those down into Postgres via the >> JDBCSink. >> >> A requirement recently came up for the need to filter these events based on >> some configurations that are currently being stored within another Kafka >> topic. I'm wondering what the best approach might be to handle this type of >> problem. >> >> My initial naive approach was: >> When Flink starts up, use a regular Kafka Consumer and read all of the >> configuration data from that topic in its entirety. >> Store the messages from that topic in some type of thread-safe collection >> statically accessible by the operators downstream. >> Expose the thread-safe collection within the operators to actually perform >> the filtering. >> This doesn't seem right though. I was reading about BroadcastState which >> seems like it might fit the bill (e.g. keep those mappings in Broadcast >> state so that all of the downstream operations would have access to them, >> which I'd imagine would handle keeping things up to date). >> >> Does Flink have a good pattern / construct to handle this? Basically, I have >> a series of mappings that I want to keep relatively up to date in a Kafka >> topic, and I'm consuming from another Kafka topic that will need those >> mappings to filter against. >> >> I'd be happy to share some of the approaches I currently have or elaborate a >> bit more if that isn't super clear. >> >> Thanks much, >> >> Rion >>