+1 for the source watermark alignment.
In the previous flink version, the source connectors are different in
implementation and it is hard to make this feature. When the consumed data
is not aligned or consuming history data, it is very easy to cause the
unalignment. Source alignment can resolve many unstable problems.

Seth Wiesman <sjwies...@gmail.com> 于2021年7月9日周五 下午11:25写道:

> +1
>
> In my opinion, this limitation is perfectly fine for the MVP. Watermark
> alignment is a long-standing issue and this already moves the ball so far
> forward.
>
> I don't expect this will cause many issues in practice, as I understand it
> the FileSource always processes one split at a time, and in my experience,
> 90% of Kafka users have a small number of partitions scale their pipelines
> to have one reader per partition. Obviously, there are larger-scale Kafka
> topics and more sources that will be ported over in the future but I think
> there is an implicit understanding that aligning sources adds latency to
> pipelines, and we can frame the follow-up "per-split" alignment as an
> optimization.
>
> On Fri, Jul 9, 2021 at 6:40 AM Piotr Nowojski <piotr.nowoj...@gmail.com>
> wrote:
>
> > Hey!
> >
> > A couple of weeks ago me and Arvid Heise played around with an idea to
> > address a long standing issue of Flink: lack of watermark/event time
> > alignment between different parallel instances of sources, that can lead
> to
> > ever growing state size for downstream operators like WindowOperator.
> >
> > We had an impression that this is relatively low hanging fruit that can
> be
> > quite easily implemented - at least partially (the first part mentioned
> in
> > the FLIP document). I have written down our proposal [1] and you can also
> > check out our PoC that we have implemented [2].
> >
> > We think that this is a quite easy proposal, that has been in large part
> > already implemented. There is one obvious limitation of our PoC. Namely
> we
> > can only easily block individual SourceOperators. This works perfectly
> fine
> > as long as there is at most one split per SourceOperator. However it
> > doesn't work with multiple splits. In that case, if a single
> > `SourceOperator` is responsible for processing both the least and the
> most
> > advanced splits, we won't be able to block this most advanced split for
> > generating new records. I'm proposing to solve this problem in the future
> > in another follow up FLIP, as a solution that works with a single split
> per
> > operator is easier and already valuable for some of the users.
> >
> > What do you think about this proposal?
> > Best, Piotrek
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-182%3A+Support+watermark+alignment+of+FLIP-27+Sources
> > [2] https://github.com/pnowojski/flink/commits/aligned-sources
> >
>

Reply via email to