Re: Sharing state between subtasks

2019-03-07 Thread Thomas Weise
cy? > > > > > > > > Thanks, > > > > Thomas > > > > > > > > > > > > > > > > On Thu, Oct 18, 2018 at 8:34 PM Zhijiang(wangzhijiang999) > > > > wrote: > > > > > > > > > Not yet. W

Re: Sharing state between subtasks

2019-03-07 Thread Gerard Garcia
> > > > > > > Not yet. We only have some initial thoughts and have not worked on it > > > yet. > > > > We will update the progress in this discussion if have. > > > > > > > > Best, > > > > Zhijiang >

Re: Sharing state between subtasks

2018-11-14 Thread Jamie Grier
> > > > Best, > > > Zhijiang > > > -- > > > 发件人:Aljoscha Krettek > > > 发送时间:2018年10月18日(星期四) 17:53 > > > 收件人:dev ; Zhijiang(wangzhijiang999) < > > > wangzhijiang...

Re: Sharing state between subtasks

2018-11-01 Thread Till Rohrmann
> Best, > > Zhijiang > > -- > > 发件人:Aljoscha Krettek > > 发送时间:2018年10月18日(星期四) 17:53 > > 收件人:dev ; Zhijiang(wangzhijiang999) < > > wangzhijiang...@aliyun.com> > > 抄 送:Till Rohrmann

Re: Sharing state between subtasks

2018-10-31 Thread Thomas Weise
cha Krettek > 发送时间:2018年10月18日(星期四) 17:53 > 收件人:dev ; Zhijiang(wangzhijiang999) < > wangzhijiang...@aliyun.com> > 抄 送:Till Rohrmann > 主 题:Re: Sharing state between subtasks > > Hi Zhijiang, > > do you already have working code or a design doc for the second approach

回复:Sharing state between subtasks

2018-10-18 Thread Zhijiang(wangzhijiang999)
(wangzhijiang999) 抄 送:Till Rohrmann 主 题:Re: Sharing state between subtasks Hi Zhijiang, do you already have working code or a design doc for the second approach? Best, Aljoscha > On 18. Oct 2018, at 08:42, Zhijiang(wangzhijiang999) > wrote: > > Just noticed this discussion from @Till Rohrm

Re: Sharing state between subtasks

2018-10-18 Thread Aljoscha Krettek
gt; Currently we prefer to the second way to implement and will refer to other > good points above. :) > > Best, > Zhijiang > -------------- > 发件人:Jamie Grier > 发送时间:2018年10月17日(星期三) 03:28 > 收件人:dev > 主 题:Re: Shar

回复:Sharing state between subtasks

2018-10-17 Thread Zhijiang(wangzhijiang999)
ement and will refer to other good points above. :) Best, Zhijiang -- 发件人:Jamie Grier 发送时间:2018年10月17日(星期三) 03:28 收件人:dev 主 题:Re: Sharing state between subtasks Here's a doc I started describing some changes we would like to mak

Re: Sharing state between subtasks

2018-10-16 Thread Jamie Grier
Here's a doc I started describing some changes we would like to make starting with the Kinesis Source.. It describes a refactoring of that code specifically and also hopefully a pattern and some reusable code we can use in the other sources as well. The end goal would be best-effort event-time syn

Re: Sharing state between subtasks

2018-10-10 Thread Till Rohrmann
But on the Kafka source level it should be perfectly fine to do what Elias proposed. This is of course is not the perfect solution but could bring us forward quite a bit. The changes required for this should also be minimal. This would become obsolete once we have something like shared state. But u

Re: Sharing state between subtasks

2018-10-10 Thread Aljoscha Krettek
The reason this selective reading doesn't work well in Flink in the moment is because of checkpointing. For checkpointing, checkpoint barriers travel within the streams. If we selectively read from inputs based on timestamps this is akin to blocking an input if that input is very far ahead in ev

Re: Sharing state between subtasks

2018-10-10 Thread Elias Levy
On Wed, Oct 10, 2018 at 9:33 AM Fabian Hueske wrote: > I think the new source interface would be designed to be able to leverage > shared state to achieve time alignment. > I don't think this would be possible without some kind of shared state. > > The problem of tasks that are far ahead in time

Re: Sharing state between subtasks

2018-10-10 Thread Thomas Weise
Thanks for the feedback and comments so far. I want to elaborate more on the need for the shared state and awareness of watermark alignment in the source implementation. Sources like Kafka and Kinesis pull from the external system and then emit the records. For Kinesis, we have multiple consumer t

Re: Sharing state between subtasks

2018-10-10 Thread Fabian Hueske
I think the new source interface would be designed to be able to leverage shared state to achieve time alignment. I don't think this would be possible without some kind of shared state. The problem of tasks that are far ahead in time cannot be solved with back-pressure. That's because a task canno

Re: Sharing state between subtasks

2018-10-10 Thread Elias Levy
On Wed, Oct 10, 2018 at 8:15 AM Aljoscha Krettek wrote: > I think the two things (shared state and new source interface) are > somewhat orthogonal. The new source interface itself alone doesn't solve > the problem, we would still need some mechanism for sharing the event-time > information betwee

Re: Sharing state between subtasks

2018-10-10 Thread Aljoscha Krettek
Sorry for also derailing this a bit earlier... I think the two things (shared state and new source interface) are somewhat orthogonal. The new source interface itself alone doesn't solve the problem, we would still need some mechanism for sharing the event-time information between different sub

Re: Sharing state between subtasks

2018-10-10 Thread Jamie Grier
Also, I'm afraid I derailed this thread just a bit.. So also back to Thomas's original question.. If we decide state-sharing across source subtasks is the way forward for now -- does anybody have thoughts to share on what form this should take? Thomas mentioned Akka or JGroups. Other thoughts?

Re: Sharing state between subtasks

2018-10-10 Thread Jamie Grier
Okay, so I think there is a lot of agreement here about (a) This is a real issue for people, and (b) an ideal long-term approach to solving it. As Aljoscha and Elias said a full solution to this would be to also redesign the source interface such that individual partitions are exposed in the API a

Re: Sharing state between subtasks

2018-10-09 Thread Elias Levy
On Tue, Oct 9, 2018 at 1:25 AM Aljoscha Krettek wrote: > @Elias Do you know if Kafka Consumers do this alignment across multiple > consumers or only within one Consumer across the partitions that it reads > from. > The behavior is part of Kafka Streams

Re: Sharing state between subtasks

2018-10-09 Thread Fabian Hueske
Hi, I think watermark / event-time skew is a problem that many users are struggling with. A built-in primitive to align event-time would be a great feature! However, there are also some cases when it would be useful for different streams to have diverging event-time, such as an interval join [1]

Re: Sharing state between subtasks

2018-10-09 Thread Aljoscha Krettek
Yes, I think this is the way to go. This would also go well with a redesign of the source interface that has been floated for a while now. I also created a prototype a while back: https://github.com/aljoscha/flink/tree/refactor-source-interface

Re: Sharing state between subtasks

2018-10-08 Thread Elias Levy
Kafka Streams handles this problem, time alignment, by processing records from the partitions with the lowest timestamp in a best effort basis. See KIP-353 for the details. The same could be done within the Kafka source and multiple input stream operators. I opened FLINK-4558

Re: Sharing state between subtasks

2018-10-08 Thread Jamie Grier
I'll add to what Thomas already said.. The larger issue driving this is that when reading from a source with many parallel partitions, especially when reading lots of historical data (or recovering from downtime and there is a backlog to read), it's quite common for there to develop an event-time

Sharing state between subtasks

2018-10-07 Thread Thomas Weise
I'm looking to implement a state sharing mechanism between subtasks (of one or multiple tasks). Our use case is to align watermarks between subtasks of one or multiple sources to prevent some data fetchers to race ahead of others and cause massive state buffering in Flink. Each subtask would share