Oh the parallelism problem didn’t bother me because we used to set the 
parallelism of rule source to be one :o). Maybe a more elegant way is hashing 
the rule emitting by #RuntimeContext#getIndexOfThisSubtask.


Best,
Jiayi Liao


 Original Message 
Sender: Gaël Renoux<gael.ren...@datadome.co>
Recipient: bupt_ljy<bupt_...@163.com>
Cc: user<user@flink.apache.org>
Date: Monday, Nov 4, 2019 23:41
Subject: Re: Finite source without blocking save-points


Hi Jiayi,


This would allow me to call the Kafka producer without risking a race 
condition, but it comes with its own problem: unless the source has a 
parallelism of 1, it will trigger multiple times. I can create a specific 
source that doesn't produce anything, has a parallelism of 1, and calls the 
producer from its open method : it's a bit ugly, but it would get rid of the 
race condition.



On Mon, Nov 4, 2019 at 3:59 PM bupt_ljy <bupt_...@163.com> wrote:

Hi Gael,
I had a similar situation before. Actually you don’t need to accomplish this in 
such a complicated way. I guess you’ve already had a rules source and you can 
send rules in #open function for a startup if your rules source inherit from 
#RichParallelSourceFunction.


Best,
Jiayi Liao


 Original Message 
Sender: Gaël Renoux<gael.ren...@datadome.co>
Recipient: user<user@flink.apache.org>
Date: Monday, Nov 4, 2019 22:50
Subject: Finite source without blocking save-points


Hello everyone,


I have a job which runs continuously, but it also needs to send a single 
specific Kafka message on startup. I tried the obvious approach to use 
StreamExecutionEnvironment.fromElements and add a Kafka sink, however that's 
not possible: the source being finished, it becomes impossible to stop the job 
with a save-point later.



The best solution I found is creating a basic Kafka producer to send the 
message, and running that producer inside the job's startup script (before 
calling StreamExecutionEnvironment.execute()). However, there's a race 
condition, where the message could be sent and trigger stuff before the job is 
ready to receive messages. In addition, it forces me to have a separate Kafka 
producer, while Flink already comes with Kafka sinks. And finally, it's pretty 
specific to my use case (sending a Kafka message), and it looks like there 
should be a generic solution here.



Do you guys know of any better way to do this? Is there any way to set up a 
finite source that will not block save-points?



Just in case, the global use case is nothing special: my job maintains a set of 
rules as broadcast state in operators and handle input according to those 
rules. On startup, I need to request all rules to be sent at once (the emitter 
normally sends updated rules only), in case the rule state has been lost 
(happens when we evolve the rule model, for instance), and this is done through 
a Kafka message.


Thanks in advance!



Gaël Renoux



-- 

Gaël Renoux
 Senior R&D Engineer, DataDome
M +33 6 76 89 16 52 
E gael.ren...@datadome.co 
W www.datadome.co

Reply via email to