HI,

I am writing a parallel source function that ideally needs to receive some
messages as control information (specifically, a state message on where to
start reading from a kinesis stream). As far as I can tell, there isn't a
way to make a sourceFunction receive input (which makes sense) so I am
thinking it makes sense to use a processFunction that will occasionally
receive control messages and mostly just output a lot of messages.

This works from an API perspective, with a few different options, I could
either:

A) have the processElement function block on calling the loop that will
produce messages or
B) have the processEelement function return (by pass the collector and
starting the reading on a different thread), but continue to produce
messages downstream

This obviously does raise some concerns though:

1. Does this break any assumptions flink has of message lifecycle? Option A
of blocking on processElement for very long periods seems straight forward
but less than ideal, not to mention not being able to handle any other
control messages.

However, I am not sure if a processFunction sending messages after the
processElement function has returned would break some expectations flink
has of operator lifeycles. Messages are also emitted by timers, etc, but
this would be completely outside any of those function calls as it is
started on another thread. This is obviously how most SourceFunctions work,
but it isn't clear if the same technique holds for ProcessFunctions

2. Would this have a negative impact on backpressure downstream? Since I am
still going to be using the same collector instance, it seems like it
should ultimately work, but I wonder if there are other details I am not
aware of.

3. Is this just a terrible idea in general? It seems like I could maybe do
this by implementing directly on top of an Operator, but I am not as
familiar with that API

Thanks in advance for any thoughts!

Addison

Reply via email to