Hi all,
I'll illustrate my approach with an example as it is definitely
unorthodox. Here's some sample code. It works for me...I hope there
are no (obvious) flaws!
//myStream should be a stream of objects associated to a timestamp.
the idea is to create a Flink app that
//sends each object to kaf
Hi Fabian,
> but each partition may only be written by a single task
Sorry I think I misunderstand something here then: If I have a topic with
one partition, but multiple sink tasks (or parallelism > 1).. this means
the data must all be shuffled to the single task writing that partition?
Padarn
Hi Padarn,
Regarding your throughput concerns: A sink task may write to multiple
partitions, but each partition may only be written by a single task.
@Eduardo: Thanks for sharing your approach! Not sure if I understood it
correctly, but I think that the approach does not guarantee that all
result
Hi,
I'll chip in with an approach I'm trying at the moment that seems to work,
and I say seems because I'm only running this on a personal project.
Personally, I don't have anything against end-of-message markers per
partition, Padarn you seem to not prefer this option as it overloads the
meaning
Hi again Fabian,
Thanks for pointing this out to me. In my case there is no need for keyed
writing - but I do wonder if having each kafka task write only to a single
partition would significantly affect performance.
Actually now that I think about it, the approach to just wait for the first
recor
Hi Padarn,
Yes, this is quite tricky.
The "problem" with watermarks is that you need to consider how you write to
Kafka.
If your Kafka sink writes to keyed Kafka stream (each Kafka partition is
written by multiple producers), you need to broadcast the watermarks to
each partition, i.e., each parti
Hi Fabian, thanks for your input
Exactly. Actually my first instinct was to see if it was possible to
publish the watermarks somehow - my initial idea was to insert regular
watermark messages into each partition of the stream, but exposing this
seemed quite troublesome.
> In that case, you could
Hi Padarn,
What you describe is essentially publishing Flink's watermarks to an
outside system.
Flink processes time windows, by waiting for a watermark that's past the
window end time. When it receives such a WM it processes and emits all
ended windows and forwards the watermark.
When a sink rece