Hi Tawfik, Fast and slow streaming in distributed scenarios leads to watermark advancing too fast, which leads to lost data and is a headache in Flink. Can't wait to read your research paper!
Best, Ron Yun Tang <myas...@live.com> 于2023年9月6日周三 14:46写道: > Hi Tawfik, > > Thanks for offering such a proposal, looking forward to your research > paper! > > You could also ask the edit permission for Flink improvement proposals to > create a new proposal if you want to contribute this to the community by > yourself. > > [1] > https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals > > Best > Yun Tang > ________________________________ > From: yuxia <luoyu...@alumni.sjtu.edu.cn> > Sent: Wednesday, September 6, 2023 12:31 > To: dev <dev@flink.apache.org> > Subject: Re: Proposal for Implementing Keyed Watermarks in Apache Flink > > Hi, Tawfik Yasser. > Thanks for the proposal. > It sounds exciting. I can't wait the research paper for more details. > > Best regards, > Yuxia > > ----- 原始邮件 ----- > 发件人: "David Morávek" <d...@apache.org> > 收件人: "dev" <dev@flink.apache.org> > 发送时间: 星期二, 2023年 9 月 05日 下午 4:36:51 > 主题: Re: Proposal for Implementing Keyed Watermarks in Apache Flink > > Hi Tawfik, > > It's exciting to see any ongoing research that tries to push Flink forward! > > The get the discussion started, can you please your paper with the > community? Assessing the proposal without further context is tough. > > Best, > D. > > On Mon, Sep 4, 2023 at 4:42 PM Tawfek Yasser Tawfek <tyas...@nu.edu.eg> > wrote: > > > Dear Apache Flink Development Team, > > > > I hope this email finds you well. I am writing to propose an exciting new > > feature for Apache Flink that has the potential to significantly enhance > > its capabilities in handling unbounded streams of events, particularly in > > the context of event-time windowing. > > > > As you may be aware, Apache Flink has been at the forefront of Big Data > > Stream processing engines, leveraging windowing techniques to manage > > unbounded event streams effectively. The accuracy of the results obtained > > from these streams relies heavily on the ability to gather all relevant > > input within a window. At the core of this process are watermarks, which > > serve as unique timestamps marking the progression of events in time. > > > > However, our analysis has revealed a critical issue with the current > > watermark generation method in Apache Flink. This method, which operates > at > > the input stream level, exhibits a bias towards faster sub-streams, > > resulting in the unfortunate consequence of dropped events from slower > > sub-streams. Our investigations showed that Apache Flink's conventional > > watermark generation approach led to an alarming data loss of > approximately > > 33% when 50% of the keys around the median experienced delays. This loss > > further escalated to over 37% when 50% of random keys were delayed. > > > > In response to this issue, we have authored a research paper outlining a > > novel strategy named "keyed watermarks" to address data loss and > > substantially enhance data processing accuracy, achieving at least 99% > > accuracy in most scenarios. > > > > Moreover, we have conducted comprehensive comparative studies to evaluate > > the effectiveness of our strategy against the conventional watermark > > generation method, specifically in terms of event-time tracking accuracy. > > > > We believe that implementing keyed watermarks in Apache Flink can greatly > > enhance its performance and reliability, making it an even more valuable > > tool for organizations dealing with complex, high-throughput data > > processing tasks. > > > > We kindly request your consideration of this proposal. We would be eager > > to discuss further details, provide the full research paper, or > collaborate > > closely to facilitate the integration of this feature into Apache Flink. > > > > Thank you for your time and attention to this proposal. We look forward > to > > the opportunity to contribute to the continued success and evolution of > > Apache Flink. > > > > Best Regards, > > > > Tawfik Yasser > > Senior Teaching Assistant @ Nile University, Egypt > > Email: tyas...@nu.edu.eg > > LinkedIn: https://www.linkedin.com/in/tawfikyasser/ > > >