Re: Editing job graph at runtime

Arvid Heise Tue, 23 Mar 2021 00:25:30 -0700

Hi,

Option 2 is to implement your own Source/Sink. Currently, we have the old,
discouraged interfaces along with the new interfaces.


For source, you want to read [1]. There is a KafkaSource already in 1.12
that we consider Beta, you can replace it with the 1.13 after the 1.13
release (should be compatible). If that is too new for you or you are using
1.11-, then you may also just implement your own SourceFunction [2], quite
possibly by just copying the existing KafkaSourceFunction and adjusting it.
Inside your source function, you would read from your main=control topic
and read the additional topics as you go. Your emitted values would then
probably also contain the source topic along with your payload. How you
model your payload data types depends on your use case and in general
dynamic types are harder to use as you will experience lots of runtime
errors (misspelled field names, mismatched field types etc.). I can outline
the options once you provide more details.

With sink, it's similar, although we miss a nice documentation for the new
interface. I'd recommend having a look at the FLIP [3], which is quite
non-technical. Again, if it doesn't work for you, you can also go with old
SinkFunctions. The implementation idea is similar, you start with an
existing sink (by copying or delegating) and write to additional topics as
you go. Again, the dynamic types (different types for output topic1 and
topic2?) will cause you quite a bit of runtime bugs.

If all types are the same, there are better options than going with custom
sources and sinks. Implementing them is an advanced topic, so I usually
discourage them unless you have lots of experience already. I can help in
any case. But in that case, I also need to understand your use case better.
(In general, if you ask a technical question, you get a technical answer.
If you ask it on a conceptual level, we may also offer architectural
advice).

[1]
https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/sources.html
[2]
https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/functions/source/SourceFunction.java
[3]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-143%3A+Unified+Sink+API

On Tue, Mar 23, 2021 at 8:10 AM Jessy Ping <tech.user.str...@gmail.com>
wrote:

> Hi Arvid,
>
> Thanks for the reply.
>
> I am currently exploring the flink features and we have certain use cases
> where new producers will be added the system dynamically and we don't want
> to restart the application frequently.
>
> It will be helpful if you explain the option 2 in detail ?
>
> Thanks & Regards
> Jessy
>
> On Mon, 22 Mar 2021 at 19:27, Arvid Heise <ar...@apache.org> wrote:
>
>> Hi Jessy,
>>
>> Can I add a new sink into the execution graph at runtime, for example : a
>>> new Kafka producer , without restarting the current application  or using
>>> option1 ?
>>>
>>
>> No, there is no way to add a sink without restart currently. Could you
>> elaborate why a restart is not an option for you?
>>
>> You can use Option 2, which means that you implement 1 source and 1 sink
>> which will dynamically read from or write to different topics possibly by
>> wrapping the existing source and sink. This is a rather complex task that I
>> would not recommend to a new Flink user.
>>
>> If you have a known set of possible sink topics, another option would be
>> to add all sinks from the go and only route messages dynamically with
>> side-outputs. However, I'm not aware that such a pattern exists for
>> sources. Although with the new source interface, it should be possible to
>> do that.
>>
>> On Wed, Mar 17, 2021 at 7:12 AM Jessy Ping <tech.user.str...@gmail.com>
>> wrote:
>>
>>> Hi Team,
>>>
>>> Can you provide your thoughts on this, it will be helpful ..
>>>
>>> Thanks
>>> Jessy
>>>
>>> On Tue, 16 Mar 2021 at 21:29, Jessy Ping <tech.user.str...@gmail.com>
>>> wrote:
>>>
>>>> Hi Timo/Team,
>>>> Thanks for the reply.
>>>>
>>>> Just take the example from the following pseduo code,
>>>> Suppose , this is the current application logic.
>>>>
>>>> firstInputStream = addSource(...)* //Kafka consumer C1*
>>>> secondInputStream =  addSource(...) *//Kafka consumer C2*
>>>>
>>>> outputStream = firstInputStream,keyBy(a -> a.key)
>>>> .connect(secondInputStream.keyBy(b->b.key))
>>>> .coProcessFunction(....)
>>>> * // logic determines : whether a new sink should be added to the
>>>> application or not ?. If not: then the event will be produced to the
>>>> existing sink(s). If a new sink is required: produce the events to the
>>>> existing sinks + the new one*
>>>> sink1 = addSink(outPutStream). //Kafka producer P1
>>>> .
>>>> .
>>>> .
>>>> sinkN =  addSink(outPutStream). //Kafka producer PN
>>>>
>>>> *Questions*
>>>> --> Can I add a new sink into the execution graph at runtime, for
>>>> example : a new Kafka producer , without restarting the current
>>>> application  or using option1 ?
>>>>
>>>> -->  (Option 2 )What do you mean by adding a custom sink at
>>>> coProcessFunction , how will it change the execution graph ?
>>>>
>>>> Thanks
>>>> Jessy
>>>>
>>>>
>>>>
>>>> On Tue, 16 Mar 2021 at 17:45, Timo Walther <twal...@apache.org> wrote:
>>>>
>>>>> Hi Jessy,
>>>>>
>>>>> to be precise, the JobGraph is not used at runtime. It is translated
>>>>> into an ExecutionGraph.
>>>>>
>>>>> But nevertheless such patterns are possible but require a bit of
>>>>> manual
>>>>> implementation.
>>>>>
>>>>> Option 1) You stop the job with a savepoint and restart the
>>>>> application
>>>>> with slightly different parameters. If the pipeline has not changed
>>>>> much, the old state can be remapped to the slightly modified job
>>>>> graph.
>>>>> This is the easiest solution but with the downside of maybe a couple
>>>>> of
>>>>> seconds downtime.
>>>>>
>>>>> Option 2) You introduce a dedicated control stream (i.e. by using the
>>>>> connect() DataStream API [1]). Either you implement a custom sink in
>>>>> the
>>>>> main stream of the CoProcessFunction. Or you enrich every record in
>>>>> the
>>>>> main stream with sink parameters that are read by you custom sink
>>>>> implementation.
>>>>>
>>>>> I hope this helps.
>>>>>
>>>>> Regards,
>>>>> Timo
>>>>>
>>>>>
>>>>> [1]
>>>>>
>>>>> https://ci.apache.org/projects/flink/flink-docs-master/docs/dev/datastream/operators/overview/#connect
>>>>>
>>>>> On 16.03.21 12:37, Jessy Ping wrote:
>>>>> > Hi Team,
>>>>> > Is it possible to edit the job graph at runtime ? . Suppose, I want
>>>>> to
>>>>> > add a new sink to the flink application at runtime that depends upon
>>>>> > the  specific parameters in the incoming events.Can i edit the
>>>>> jobgraph
>>>>> > of a running flink application ?
>>>>> >
>>>>> > Thanks
>>>>> > Jessy
>>>>>
>>>>>

Re: Editing job graph at runtime

Reply via email to