I just realized I did not mention a key concept in my idea:
All watermarks are sent into all Kafka partitions so every consumer gets
all watermarks from all upstream producing tasks.
On Wed, Dec 22, 2021 at 11:33 AM Niels Basjes wrote:
> > Perhaps for this an explicit config is needed per source
> Perhaps for this an explicit config is needed per source id (pattern?):
> - Important producer: expire after 24 hours.
> - Yet another IOT sensor: expire after 1 minute.
> - and a default.
Do note that this actually does something I'm not a fan of: Introduce a
"Processing time" conclusion (the t
On Tue, Dec 21, 2021 at 9:42 PM Matthias J. Sax wrote:
> Your high level layout make sense. However, I think there are a few
> problems doing it with Flink:
>
> (1) How to encode the watermark? It could break downstream consumers
> that don't know what to do with them (eg, crash on deserializatio
Your high level layout make sense. However, I think there are a few
problems doing it with Flink:
(1) How to encode the watermark? It could break downstream consumers
that don't know what to do with them (eg, crash on deserialization)?
There is no guarantee that only a downstream Flink job con
Hi,
Like I said I've only just started thinking about how this can be
implemented (I'm currently still lacking a lot of knowledge).
So at this point I do not yet see why solving this in the transport (like
Kafka) is easier than solving it in the processing engine (like Flink).
In the normal scenar
I think this problem should be tackled inside Kafka, not Flink.
Kafka already has internal control messages to write transaction
markers. Those could be extended to carry watermark information. It
would be best to generalize those as "user control messages" and
watermarks could just be one app
I'm reading the Pulsar PIP and noticed another thing to take into account:
multiple applications (with each a different parallelism) that all write
into the same topic.
On Mon, 20 Dec 2021, 10:45 Niels Basjes, wrote:
> Hi Till,
>
> This morning I also realized what you call an 'effective waterma
Hi Till,
This morning I also realized what you call an 'effective watermark' is
indeed what is needed.
I'm going to read up on what Pulsar has planned.
What I realized is that the consuming application must be aware of the
parallelism of the producing application, which is independent of the
part
Hi Niels,
if you have multiple inputs going into a single Kafka partition then you
have to calculate the effective watermark by looking at the min watermark
from all inputs. You could insert a Flink operator that takes care of it
and then writes to a set of partitions in 1:n relationship. Alternat
Hi,
About a year ago I spoke at the Flink Forward conference (
https://www.youtube.com/watch?v=wqRDyrE3dwg ) about handling development
problems regarding streaming applications and handling the lack of events
in a stream.
Something I spoke about towards the end of this talk was the idea to ship
t
10 matches
Mail list logo