Re: Sending watermarks into Kafka

2021-12-22 Thread Niels Basjes
I just realized I did not mention a key concept in my idea: All watermarks are sent into all Kafka partitions so every consumer gets all watermarks from all upstream producing tasks. On Wed, Dec 22, 2021 at 11:33 AM Niels Basjes wrote: > > Perhaps for this an explicit config is needed per source

Re: Sending watermarks into Kafka

2021-12-22 Thread Niels Basjes
> Perhaps for this an explicit config is needed per source id (pattern?): > - Important producer: expire after 24 hours. > - Yet another IOT sensor: expire after 1 minute. > - and a default. Do note that this actually does something I'm not a fan of: Introduce a "Processing time" conclusion (the t

Re: Sending watermarks into Kafka

2021-12-22 Thread Niels Basjes
On Tue, Dec 21, 2021 at 9:42 PM Matthias J. Sax wrote: > Your high level layout make sense. However, I think there are a few > problems doing it with Flink: > > (1) How to encode the watermark? It could break downstream consumers > that don't know what to do with them (eg, crash on deserializatio

Re: Sending watermarks into Kafka

2021-12-21 Thread Matthias J. Sax
Your high level layout make sense. However, I think there are a few problems doing it with Flink: (1) How to encode the watermark? It could break downstream consumers that don't know what to do with them (eg, crash on deserialization)? There is no guarantee that only a downstream Flink job con

Re: Sending watermarks into Kafka

2021-12-21 Thread Niels Basjes
Hi, Like I said I've only just started thinking about how this can be implemented (I'm currently still lacking a lot of knowledge). So at this point I do not yet see why solving this in the transport (like Kafka) is easier than solving it in the processing engine (like Flink). In the normal scenar

Re: Sending watermarks into Kafka

2021-12-20 Thread Matthias J. Sax
I think this problem should be tackled inside Kafka, not Flink. Kafka already has internal control messages to write transaction markers. Those could be extended to carry watermark information. It would be best to generalize those as "user control messages" and watermarks could just be one app

Re: Sending watermarks into Kafka

2021-12-20 Thread Niels Basjes
I'm reading the Pulsar PIP and noticed another thing to take into account: multiple applications (with each a different parallelism) that all write into the same topic. On Mon, 20 Dec 2021, 10:45 Niels Basjes, wrote: > Hi Till, > > This morning I also realized what you call an 'effective waterma

Re: Sending watermarks into Kafka

2021-12-20 Thread Niels Basjes
Hi Till, This morning I also realized what you call an 'effective watermark' is indeed what is needed. I'm going to read up on what Pulsar has planned. What I realized is that the consuming application must be aware of the parallelism of the producing application, which is independent of the part

Re: Sending watermarks into Kafka

2021-12-20 Thread Till Rohrmann
Hi Niels, if you have multiple inputs going into a single Kafka partition then you have to calculate the effective watermark by looking at the min watermark from all inputs. You could insert a Flink operator that takes care of it and then writes to a set of partitions in 1:n relationship. Alternat

Sending watermarks into Kafka

2021-12-19 Thread Niels Basjes
Hi, About a year ago I spoke at the Flink Forward conference ( https://www.youtube.com/watch?v=wqRDyrE3dwg ) about handling development problems regarding streaming applications and handling the lack of events in a stream. Something I spoke about towards the end of this talk was the idea to ship t