The notion of per-split watermarks seems quite interesting.  I think the
idleness feature could benefit from a per-split approach too, because
idleness is typically related to whether any splits are assigned to a given
operator instance.


On Mon, Jul 12, 2021 at 3:06 AM 刘建刚 <liujiangangp...@gmail.com> wrote:

> +1 for the source watermark alignment.
> In the previous flink version, the source connectors are different in
> implementation and it is hard to make this feature. When the consumed data
> is not aligned or consuming history data, it is very easy to cause the
> unalignment. Source alignment can resolve many unstable problems.
>
> Seth Wiesman <sjwies...@gmail.com> 于2021年7月9日周五 下午11:25写道:
>
> > +1
> >
> > In my opinion, this limitation is perfectly fine for the MVP. Watermark
> > alignment is a long-standing issue and this already moves the ball so far
> > forward.
> >
> > I don't expect this will cause many issues in practice, as I understand
> it
> > the FileSource always processes one split at a time, and in my
> experience,
> > 90% of Kafka users have a small number of partitions scale their
> pipelines
> > to have one reader per partition. Obviously, there are larger-scale Kafka
> > topics and more sources that will be ported over in the future but I
> think
> > there is an implicit understanding that aligning sources adds latency to
> > pipelines, and we can frame the follow-up "per-split" alignment as an
> > optimization.
> >
> > On Fri, Jul 9, 2021 at 6:40 AM Piotr Nowojski <piotr.nowoj...@gmail.com>
> > wrote:
> >
> > > Hey!
> > >
> > > A couple of weeks ago me and Arvid Heise played around with an idea to
> > > address a long standing issue of Flink: lack of watermark/event time
> > > alignment between different parallel instances of sources, that can
> lead
> > to
> > > ever growing state size for downstream operators like WindowOperator.
> > >
> > > We had an impression that this is relatively low hanging fruit that can
> > be
> > > quite easily implemented - at least partially (the first part mentioned
> > in
> > > the FLIP document). I have written down our proposal [1] and you can
> also
> > > check out our PoC that we have implemented [2].
> > >
> > > We think that this is a quite easy proposal, that has been in large
> part
> > > already implemented. There is one obvious limitation of our PoC. Namely
> > we
> > > can only easily block individual SourceOperators. This works perfectly
> > fine
> > > as long as there is at most one split per SourceOperator. However it
> > > doesn't work with multiple splits. In that case, if a single
> > > `SourceOperator` is responsible for processing both the least and the
> > most
> > > advanced splits, we won't be able to block this most advanced split for
> > > generating new records. I'm proposing to solve this problem in the
> future
> > > in another follow up FLIP, as a solution that works with a single split
> > per
> > > operator is easier and already valuable for some of the users.
> > >
> > > What do you think about this proposal?
> > > Best, Piotrek
> > >
> > > [1]
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-182%3A+Support+watermark+alignment+of+FLIP-27+Sources
> > > [2] https://github.com/pnowojski/flink/commits/aligned-sources
> > >
> >
>

Reply via email to