Hi,

I wanted to check if there is active development on FLIP-182 and what the
target release for it might be? [1] still shows as under discussion.

Regarding the per-subtask vs. per-split limitation: I think it will be
important that this eventually works per split, since in some cases it
won't be practical to limit a subtask to a single split (think KafkaSource
reading from many topics with diverse volumes).

Thanks,
Thomas

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-182%3A+Support+watermark+alignment+of+FLIP-27+Sources

On Wed, Jul 21, 2021 at 4:48 AM Piotr Nowojski <pnowoj...@apache.org> wrote:

> Hi,
>
> >  I would not fully advertise this before we have the second part
> implemented as well.
>
> I'm not sure, maybe we could advertise with a big warning about this
> limitation. I mean it's not as if this change would break something. At
> worst it just wouldn't fully solve the problem with multiple splits per
> single operator, but only limit the scope of that problem. At the same time
> I don't have strong feelings about this. If the consensus would be to not
> advertise it, I'm also fine with it. Only in that case we should probably
> quickly follow up with the per split solution.
>
> Anyway, thanks for voicing your support and the discussions. I'm going to
> start a voting thread for this feature soon.
>
> Best,
> Piotrek
>
> wt., 13 lip 2021 o 19:09 Stephan Ewen <se...@apache.org> napisał(a):
>
> > @Eron Wright <eronwri...@gmail.com>  The per-split watermarks are the
> > default in the new source interface (FLIP-27) and come for free if you
> use
> > the SplitReader.
> >
> > Based on that, it is also possible to unsubscribe individual splits to
> > solve the alignment in the case where operators have multiple splits
> > assigned.
> > Piotr and I already discussed that, but concluded that the implementation
> > of that is largely orthogonal.
> >
> > I am a bit worried, though, that if we release and advertise the
> alignment
> > without handling this case, we create a surprise for quite a few users.
> > While this is admittedly valuable for some users, I think we need to
> > position this accordingly. I would not fully advertise this before we
> have
> > the second part implemented as well.
> >
> >
> >
> > On Mon, Jul 12, 2021 at 7:18 PM Eron Wright <ewri...@streamnative.io
> > .invalid>
> > wrote:
> >
> > > The notion of per-split watermarks seems quite interesting.  I think
> the
> > > idleness feature could benefit from a per-split approach too, because
> > > idleness is typically related to whether any splits are assigned to a
> > given
> > > operator instance.
> > >
> > >
> > > On Mon, Jul 12, 2021 at 3:06 AM 刘建刚 <liujiangangp...@gmail.com> wrote:
> > >
> > > > +1 for the source watermark alignment.
> > > > In the previous flink version, the source connectors are different in
> > > > implementation and it is hard to make this feature. When the consumed
> > > data
> > > > is not aligned or consuming history data, it is very easy to cause
> the
> > > > unalignment. Source alignment can resolve many unstable problems.
> > > >
> > > > Seth Wiesman <sjwies...@gmail.com> 于2021年7月9日周五 下午11:25写道:
> > > >
> > > > > +1
> > > > >
> > > > > In my opinion, this limitation is perfectly fine for the MVP.
> > Watermark
> > > > > alignment is a long-standing issue and this already moves the ball
> so
> > > far
> > > > > forward.
> > > > >
> > > > > I don't expect this will cause many issues in practice, as I
> > understand
> > > > it
> > > > > the FileSource always processes one split at a time, and in my
> > > > experience,
> > > > > 90% of Kafka users have a small number of partitions scale their
> > > > pipelines
> > > > > to have one reader per partition. Obviously, there are larger-scale
> > > Kafka
> > > > > topics and more sources that will be ported over in the future but
> I
> > > > think
> > > > > there is an implicit understanding that aligning sources adds
> latency
> > > to
> > > > > pipelines, and we can frame the follow-up "per-split" alignment as
> an
> > > > > optimization.
> > > > >
> > > > > On Fri, Jul 9, 2021 at 6:40 AM Piotr Nowojski <
> > > piotr.nowoj...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hey!
> > > > > >
> > > > > > A couple of weeks ago me and Arvid Heise played around with an
> idea
> > > to
> > > > > > address a long standing issue of Flink: lack of watermark/event
> > time
> > > > > > alignment between different parallel instances of sources, that
> can
> > > > lead
> > > > > to
> > > > > > ever growing state size for downstream operators like
> > WindowOperator.
> > > > > >
> > > > > > We had an impression that this is relatively low hanging fruit
> that
> > > can
> > > > > be
> > > > > > quite easily implemented - at least partially (the first part
> > > mentioned
> > > > > in
> > > > > > the FLIP document). I have written down our proposal [1] and you
> > can
> > > > also
> > > > > > check out our PoC that we have implemented [2].
> > > > > >
> > > > > > We think that this is a quite easy proposal, that has been in
> large
> > > > part
> > > > > > already implemented. There is one obvious limitation of our PoC.
> > > Namely
> > > > > we
> > > > > > can only easily block individual SourceOperators. This works
> > > perfectly
> > > > > fine
> > > > > > as long as there is at most one split per SourceOperator. However
> > it
> > > > > > doesn't work with multiple splits. In that case, if a single
> > > > > > `SourceOperator` is responsible for processing both the least and
> > the
> > > > > most
> > > > > > advanced splits, we won't be able to block this most advanced
> split
> > > for
> > > > > > generating new records. I'm proposing to solve this problem in
> the
> > > > future
> > > > > > in another follow up FLIP, as a solution that works with a single
> > > split
> > > > > per
> > > > > > operator is easier and already valuable for some of the users.
> > > > > >
> > > > > > What do you think about this proposal?
> > > > > > Best, Piotrek
> > > > > >
> > > > > > [1]
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-182%3A+Support+watermark+alignment+of+FLIP-27+Sources
> > > > > > [2] https://github.com/pnowojski/flink/commits/aligned-sources
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to