Re: Handling "Global" Updating State

Rion Williams Mon, 17 May 2021 04:53:40 -0700

Hi Yun,

That’s very helpful and good to know that the problem/use-case has been thought 
about. Since my need is probably shorter-term than later, I’ll likely need to 
explore a workaround.


Do you know of an approach that might not require the use of check pointing and 
restarting? I was looking into exploring initializeState within my 
broadcast-side stream to get it current and then simply listening to the Kafka 
topic as records come in. I’d imagine this would work, but that may be a bit of 
a naive approach.

Thanks!

Rion 

> On May 17, 2021, at 1:36 AM, Yun Gao <yungao...@aliyun.com> wrote:
> 
> 
> Hi Rion, 
> 
> I think FLIP-150[1] should be able to solve this scenario.
> 
> Since FLIP-150 is still under discussion, for now a temporary method come 
> to me might be
> 1. Write a first job to read the kafka and update the broadcast state of some 
> operator. The job
> would keep the source alive after all the data are emit (like sleep forever), 
> and when all the data 
> are processed, then stop the job with savepoint. 
> 2. Use the savepoint to start the original job. For the operator required the 
> broadcast state, it could
> set the same uid and same state name with the corresponding operator in the 
> first job, so it could
> acqure the state content on startup.
> 
> Yun,
> Best
> 
> [1] 
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-150%3A+Introduce+Hybrid+Source
> 
> ------------------Original Mail ------------------
> Sender:Rion Williams <rionmons...@gmail.com>
> Send Date:Mon May 17 07:00:03 2021
> Recipients:user <user@flink.apache.org>
> Subject:Re: Handling "Global" Updating State
>> Hey folks,
>> 
>> After digging into this a bit it does seem like Broadcast State would fit 
>> the bill for this scenario and keeping the downstream operators up-to-date 
>> as messages arrived in my Kafka topic.
>> 
>> My question is - is there a pattern for pre-populating the state initially? 
>> In my case, I need to have loaded all of my “lookup” topic into state before 
>> processing any records in the other stream.
>> 
>> My thought initially is to do something like this, if it’s possible:
>> 
>> - Create a KafkaConsumer on startup to read the lookup topic in its entirety 
>> into some collection like a hashmap (prior to executing the Flink pipeline 
>> to ensure synchronicity)
>> - Use this to initialize the state of my broadcast stream (if possible)
>> - At this point that stream would be broadcasting any new records coming in, 
>> so I “should” stay up to date at that point.
>> 
>> Is this an oversimplification or is there an obviously better / well known 
>> approach to handling this?
>> 
>> Thanks,
>> 
>> Rion
>> 
>> On May 14, 2021, at 9:51 AM, Rion Williams <rionmons...@gmail.com> wrote:
>> 
>> 
>> Hi all,
>> 
>> I've encountered a challenge within a Flink job that I'm currently working 
>> on. The gist of it is that I have a job that listens to a series of events 
>> from a Kafka topic and eventually sinks those down into Postgres via the 
>> JDBCSink.
>> 
>> A requirement recently came up for the need to filter these events based on 
>> some configurations that are currently being stored within another Kafka 
>> topic. I'm wondering what the best approach might be to handle this type of 
>> problem.
>> 
>> My initial naive approach was:
>> When Flink starts up, use a regular Kafka Consumer and read all of the 
>> configuration data from that topic in its entirety.
>> Store the messages from that topic in some type of thread-safe collection 
>> statically accessible by the operators downstream.
>> Expose the thread-safe collection within the operators to actually perform 
>> the filtering.
>> This doesn't seem right though. I was reading about BroadcastState which 
>> seems like it might fit the bill (e.g. keep those mappings in Broadcast 
>> state so that all of the downstream operations would have access to them, 
>> which I'd imagine would handle keeping things up to date). 
>> 
>> Does Flink have a good pattern / construct to handle this? Basically, I have 
>> a series of mappings that I want to keep relatively up to date in a Kafka 
>> topic, and I'm consuming from another Kafka topic that will need those 
>> mappings to filter against.
>> 
>> I'd be happy to share some of the approaches I currently have or elaborate a 
>> bit more if that isn't super clear.
>> 
>> Thanks much,
>> 
>> Rion
>>

Re: Handling "Global" Updating State

Reply via email to