Re: [DISCUSS] Definition of idle partitions

2021-07-18 Thread Arvid Heise
Hi everyone, I created a FLIP and started a discussion around that topic [1]. Best, Arvid [1] https://lists.apache.org/thread.html/r8357d64b9cfdf5a233c53a20d9ac62b75c07c925ce2c43e162f1e39c%40%3Cdev.flink.apache.org%3E On Thu, Jun 10, 2021 at 9:18 PM Eron Wright wrote: > I quickly updated the

Re: [DISCUSS] Definition of idle partitions

2021-06-10 Thread Eron Wright
I quickly updated the draft PR that would propagate idleness information to the Sink function, based on the recent improvement provided by FLINK-18934. For illustration purposes. https://github.com/streamnative/flink/pull/2 On Thu, Jun 10, 2021 at 11:34 AM Eron Wright wrote: > Regarding records

Re: [DISCUSS] Definition of idle partitions

2021-06-10 Thread Eron Wright
Regarding records vs watermarks, I feel it is wrong to include records in the considerations, because the clearest definition of idleness (IMO) is 'active participation in advancing the event-time clock', and records don't directly affect the clock. Of course, records indirectly influence the cloc

Re: [DISCUSS] Definition of idle partitions

2021-06-10 Thread Till Rohrmann
Thanks for providing these details Gordon. I have to admit that I do not fully follow the reasoning why periodic watermark generators forced us to define idleness for records. Is it because the idleness was generated based on the non-availability of more data in the sources and not in the watermark

Re: [DISCUSS] Definition of idle partitions

2021-06-09 Thread Tzu-Li (Gordon) Tai
Forgot to provide the link to the [1] reference: [1] https://issues.apache.org/jira/browse/FLINK-5017 On Thu, Jun 10, 2021 at 11:43 AM Tzu-Li (Gordon) Tai wrote: > Hi everyone, > > Sorry for chiming in late here. > > Regarding the topic of changing the definition of StreamStatus and > changing

Re: [DISCUSS] Definition of idle partitions

2021-06-09 Thread Tzu-Li (Gordon) Tai
Hi everyone, Sorry for chiming in late here. Regarding the topic of changing the definition of StreamStatus and changing the name as well: After digging into some of the roots of this implementation [1], initially the StreamStatus was actually defined to mark "watermark idleness", and not "record

Re: [DISCUSS] Definition of idle partitions

2021-06-09 Thread Arvid Heise
Hi Eron, again to recap from the other thread: - You are right that idleness is correct with static assignment and fully active partitions. In this case, the source defines idleness. (case A) - For the more pressing use cases of idle, assigned partitions, the user defines an idleness threshold, an

Re: [DISCUSS] Definition of idle partitions

2021-06-08 Thread Piotr Nowojski
Hi Eron, Can you elaborate a bit more what do you mean? I don’t understand what do you mean by more general solution. As of now, stream is marked idle by a source/watermark generator, which has an effect of temporarily ignoring this stream/partition from calculating min watermark in the downs

Re: [DISCUSS] Definition of idle partitions

2021-06-08 Thread Eron Wright
It seems to me that idleness was introduced to deal with a very specific issue. In the pipeline, watermarks are aggregated not on a per-split basis but on a per-subtask basis. This works well when each subtask has exactly one split. When a sub-task has multiple splits, various complications occu

Re: [DISCUSS] Definition of idle partitions

2021-06-08 Thread Piotr Nowojski
Hi Arvid, Thanks for writing down this summary and proposal. I think this was the foundation of the disagreement in FLIP-167 discussion. Dawid was arguing that idleness is intermittent, strictly a task local concept and as such shouldn't be exposed in for example sinks. While me and Eron thought t