The notion of per-split watermarks seems quite interesting. I think the idleness feature could benefit from a per-split approach too, because idleness is typically related to whether any splits are assigned to a given operator instance.
On Mon, Jul 12, 2021 at 3:06 AM 刘建刚 <liujiangangp...@gmail.com> wrote: > +1 for the source watermark alignment. > In the previous flink version, the source connectors are different in > implementation and it is hard to make this feature. When the consumed data > is not aligned or consuming history data, it is very easy to cause the > unalignment. Source alignment can resolve many unstable problems. > > Seth Wiesman <sjwies...@gmail.com> 于2021年7月9日周五 下午11:25写道: > > > +1 > > > > In my opinion, this limitation is perfectly fine for the MVP. Watermark > > alignment is a long-standing issue and this already moves the ball so far > > forward. > > > > I don't expect this will cause many issues in practice, as I understand > it > > the FileSource always processes one split at a time, and in my > experience, > > 90% of Kafka users have a small number of partitions scale their > pipelines > > to have one reader per partition. Obviously, there are larger-scale Kafka > > topics and more sources that will be ported over in the future but I > think > > there is an implicit understanding that aligning sources adds latency to > > pipelines, and we can frame the follow-up "per-split" alignment as an > > optimization. > > > > On Fri, Jul 9, 2021 at 6:40 AM Piotr Nowojski <piotr.nowoj...@gmail.com> > > wrote: > > > > > Hey! > > > > > > A couple of weeks ago me and Arvid Heise played around with an idea to > > > address a long standing issue of Flink: lack of watermark/event time > > > alignment between different parallel instances of sources, that can > lead > > to > > > ever growing state size for downstream operators like WindowOperator. > > > > > > We had an impression that this is relatively low hanging fruit that can > > be > > > quite easily implemented - at least partially (the first part mentioned > > in > > > the FLIP document). I have written down our proposal [1] and you can > also > > > check out our PoC that we have implemented [2]. > > > > > > We think that this is a quite easy proposal, that has been in large > part > > > already implemented. There is one obvious limitation of our PoC. Namely > > we > > > can only easily block individual SourceOperators. This works perfectly > > fine > > > as long as there is at most one split per SourceOperator. However it > > > doesn't work with multiple splits. In that case, if a single > > > `SourceOperator` is responsible for processing both the least and the > > most > > > advanced splits, we won't be able to block this most advanced split for > > > generating new records. I'm proposing to solve this problem in the > future > > > in another follow up FLIP, as a solution that works with a single split > > per > > > operator is easier and already valuable for some of the users. > > > > > > What do you think about this proposal? > > > Best, Piotrek > > > > > > [1] > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-182%3A+Support+watermark+alignment+of+FLIP-27+Sources > > > [2] https://github.com/pnowojski/flink/commits/aligned-sources > > > > > >