Re: [ANNOUNCE] Apache Flink 1.18.0 released

2023-10-27 Thread ConradJam
Congratulations!

Jingsong Li  于2023年10月27日周五 13:55写道:

> Congratulations!
>
> Thanks Jing and other release managers and all contributors.
>
> Best,
> Jingsong
>
> On Fri, Oct 27, 2023 at 1:52 PM Zakelly Lan  wrote:
> >
> > Congratulations and thank you all!
> >
> >
> > Best,
> > Zakelly
> >
> > On Fri, Oct 27, 2023 at 12:39 PM Jark Wu  wrote:
> > >
> > > Congratulations and thanks release managers and everyone who has
> > > contributed!
> > >
> > > Best,
> > > Jark
> > >
> > > On Fri, 27 Oct 2023 at 12:25, Hang Ruan 
> wrote:
> > >
> > > > Congratulations!
> > > >
> > > > Best,
> > > > Hang
> > > >
> > > > Samrat Deb  于2023年10月27日周五 11:50写道:
> > > >
> > > > > Congratulations on the great release
> > > > >
> > > > > Bests,
> > > > > Samrat
> > > > >
> > > > > On Fri, 27 Oct 2023 at 7:59 AM, Yangze Guo 
> wrote:
> > > > >
> > > > > > Great work! Congratulations to everyone involved!
> > > > > >
> > > > > > Best,
> > > > > > Yangze Guo
> > > > > >
> > > > > > On Fri, Oct 27, 2023 at 10:23 AM Qingsheng Ren  >
> > > > wrote:
> > > > > > >
> > > > > > > Congratulations and big THANK YOU to everyone helping with this
> > > > > release!
> > > > > > >
> > > > > > > Best,
> > > > > > > Qingsheng
> > > > > > >
> > > > > > > On Fri, Oct 27, 2023 at 10:18 AM Benchao Li <
> libenc...@apache.org>
> > > > > > wrote:
> > > > > > >>
> > > > > > >> Great work, thanks everyone involved!
> > > > > > >>
> > > > > > >> Rui Fan <1996fan...@gmail.com> 于2023年10月27日周五 10:16写道:
> > > > > > >> >
> > > > > > >> > Thanks for the great work!
> > > > > > >> >
> > > > > > >> > Best,
> > > > > > >> > Rui
> > > > > > >> >
> > > > > > >> > On Fri, Oct 27, 2023 at 10:03 AM Paul Lam <
> paullin3...@gmail.com>
> > > > > > wrote:
> > > > > > >> >
> > > > > > >> > > Finally! Thanks to all!
> > > > > > >> > >
> > > > > > >> > > Best,
> > > > > > >> > > Paul Lam
> > > > > > >> > >
> > > > > > >> > > > 2023年10月27日 03:58,Alexander Fedulov <
> > > > > alexander.fedu...@gmail.com>
> > > > > > 写道:
> > > > > > >> > > >
> > > > > > >> > > > Great work, thanks everyone!
> > > > > > >> > > >
> > > > > > >> > > > Best,
> > > > > > >> > > > Alexander
> > > > > > >> > > >
> > > > > > >> > > > On Thu, 26 Oct 2023 at 21:15, Martijn Visser <
> > > > > > martijnvis...@apache.org>
> > > > > > >> > > > wrote:
> > > > > > >> > > >
> > > > > > >> > > >> Thank you all who have contributed!
> > > > > > >> > > >>
> > > > > > >> > > >> Op do 26 okt 2023 om 18:41 schreef Feng Jin <
> > > > > > jinfeng1...@gmail.com>
> > > > > > >> > > >>
> > > > > > >> > > >>> Thanks for the great work! Congratulations
> > > > > > >> > > >>>
> > > > > > >> > > >>>
> > > > > > >> > > >>> Best,
> > > > > > >> > > >>> Feng Jin
> > > > > > >> > > >>>
> > > > > > >> > > >>> On Fri, Oct 27, 2023 at 12:36 AM Leonard Xu <
> > > > > xbjt...@gmail.com>
> > > > > > wrote:
> > > > > > >> > > >>>
> > > > > > >> > >  Congratulations, Well done!
> > > > > > >> > > 
> > > > > > >> > >  Best,
> > > > > > >> > >  Leonard
> > > > > > >> > > 
> > > > > > >> > >  On Fri, Oct 27, 2023 at 12:23 AM Lincoln Lee <
> > > > > > lincoln.8...@gmail.com>
> > > > > > >> > >  wrote:
> > > > > > >> > > 
> > > > > > >> > > > Thanks for the great work! Congrats all!
> > > > > > >> > > >
> > > > > > >> > > > Best,
> > > > > > >> > > > Lincoln Lee
> > > > > > >> > > >
> > > > > > >> > > >
> > > > > > >> > > > Jing Ge  于2023年10月27日周五
> > > > > 00:16写道:
> > > > > > >> > > >
> > > > > > >> > > >> The Apache Flink community is very happy to
> announce the
> > > > > > release of
> > > > > > >> > > > Apache
> > > > > > >> > > >> Flink 1.18.0, which is the first release for the
> Apache
> > > > > > Flink 1.18
> > > > > > >> > > > series.
> > > > > > >> > > >>
> > > > > > >> > > >> Apache Flink® is an open-source unified stream and
> batch
> > > > > data
> > > > > > >> > >  processing
> > > > > > >> > > >> framework for distributed, high-performing,
> > > > > > always-available, and
> > > > > > >> > > > accurate
> > > > > > >> > > >> data applications.
> > > > > > >> > > >>
> > > > > > >> > > >> The release is available for download at:
> > > > > > >> > > >> https://flink.apache.org/downloads.html
> > > > > > >> > > >>
> > > > > > >> > > >> Please check out the release blog post for an
> overview of
> > > > > the
> > > > > > >> > > > improvements
> > > > > > >> > > >> for this release:
> > > > > > >> > > >>
> > > > > > >> > > >>
> > > > > > >> > > >
> > > > > > >> > > 
> > > > > > >> > > >>>
> > > > > > >> > > >>
> > > > > > >> > >
> > > > > >
> > > > >
> > > >
> https://flink.apache.org/2023/10/24/announcing-the-release-of-apache-flink-1.18/
> > > > > > >> > > >>
> > > > > > >> > > >> The full release notes are available in Jira:
> > > > > > >> > > >>
> > > > > > >> > > >>
> > > > > > >> > > >
> > > > > > >> > > 
> > > > > > >> > > >>>
> > > > > > >> > > >>
> > > 

Re:[ANNOUNCE] Apache Flink 1.18.0 released

2023-10-27 Thread Yuepeng Pan
Thanks for the great work! Congratulations to everyone involved!


Best,
Yuepeng Pan

At 2023-10-27 15:06:40, "ConradJam"  wrote:
>Congratulations!
>
>Jingsong Li  于2023年10月27日周五 13:55写道:
>
>> Congratulations!
>>
>> Thanks Jing and other release managers and all contributors.
>>
>> Best,
>> Jingsong
>>
>> On Fri, Oct 27, 2023 at 1:52 PM Zakelly Lan  wrote:
>> >
>> > Congratulations and thank you all!
>> >
>> >
>> > Best,
>> > Zakelly
>> >
>> > On Fri, Oct 27, 2023 at 12:39 PM Jark Wu  wrote:
>> > >
>> > > Congratulations and thanks release managers and everyone who has
>> > > contributed!
>> > >
>> > > Best,
>> > > Jark
>> > >
>> > > On Fri, 27 Oct 2023 at 12:25, Hang Ruan 
>> wrote:
>> > >
>> > > > Congratulations!
>> > > >
>> > > > Best,
>> > > > Hang
>> > > >
>> > > > Samrat Deb  于2023年10月27日周五 11:50写道:
>> > > >
>> > > > > Congratulations on the great release
>> > > > >
>> > > > > Bests,
>> > > > > Samrat
>> > > > >
>> > > > > On Fri, 27 Oct 2023 at 7:59 AM, Yangze Guo 
>> wrote:
>> > > > >
>> > > > > > Great work! Congratulations to everyone involved!
>> > > > > >
>> > > > > > Best,
>> > > > > > Yangze Guo
>> > > > > >
>> > > > > > On Fri, Oct 27, 2023 at 10:23 AM Qingsheng Ren > >
>> > > > wrote:
>> > > > > > >
>> > > > > > > Congratulations and big THANK YOU to everyone helping with this
>> > > > > release!
>> > > > > > >
>> > > > > > > Best,
>> > > > > > > Qingsheng
>> > > > > > >
>> > > > > > > On Fri, Oct 27, 2023 at 10:18 AM Benchao Li <
>> libenc...@apache.org>
>> > > > > > wrote:
>> > > > > > >>
>> > > > > > >> Great work, thanks everyone involved!
>> > > > > > >>
>> > > > > > >> Rui Fan <1996fan...@gmail.com> 于2023年10月27日周五 10:16写道:
>> > > > > > >> >
>> > > > > > >> > Thanks for the great work!
>> > > > > > >> >
>> > > > > > >> > Best,
>> > > > > > >> > Rui
>> > > > > > >> >
>> > > > > > >> > On Fri, Oct 27, 2023 at 10:03 AM Paul Lam <
>> paullin3...@gmail.com>
>> > > > > > wrote:
>> > > > > > >> >
>> > > > > > >> > > Finally! Thanks to all!
>> > > > > > >> > >
>> > > > > > >> > > Best,
>> > > > > > >> > > Paul Lam
>> > > > > > >> > >
>> > > > > > >> > > > 2023年10月27日 03:58,Alexander Fedulov <
>> > > > > alexander.fedu...@gmail.com>
>> > > > > > 写道:
>> > > > > > >> > > >
>> > > > > > >> > > > Great work, thanks everyone!
>> > > > > > >> > > >
>> > > > > > >> > > > Best,
>> > > > > > >> > > > Alexander
>> > > > > > >> > > >
>> > > > > > >> > > > On Thu, 26 Oct 2023 at 21:15, Martijn Visser <
>> > > > > > martijnvis...@apache.org>
>> > > > > > >> > > > wrote:
>> > > > > > >> > > >
>> > > > > > >> > > >> Thank you all who have contributed!
>> > > > > > >> > > >>
>> > > > > > >> > > >> Op do 26 okt 2023 om 18:41 schreef Feng Jin <
>> > > > > > jinfeng1...@gmail.com>
>> > > > > > >> > > >>
>> > > > > > >> > > >>> Thanks for the great work! Congratulations
>> > > > > > >> > > >>>
>> > > > > > >> > > >>>
>> > > > > > >> > > >>> Best,
>> > > > > > >> > > >>> Feng Jin
>> > > > > > >> > > >>>
>> > > > > > >> > > >>> On Fri, Oct 27, 2023 at 12:36 AM Leonard Xu <
>> > > > > xbjt...@gmail.com>
>> > > > > > wrote:
>> > > > > > >> > > >>>
>> > > > > > >> > >  Congratulations, Well done!
>> > > > > > >> > > 
>> > > > > > >> > >  Best,
>> > > > > > >> > >  Leonard
>> > > > > > >> > > 
>> > > > > > >> > >  On Fri, Oct 27, 2023 at 12:23 AM Lincoln Lee <
>> > > > > > lincoln.8...@gmail.com>
>> > > > > > >> > >  wrote:
>> > > > > > >> > > 
>> > > > > > >> > > > Thanks for the great work! Congrats all!
>> > > > > > >> > > >
>> > > > > > >> > > > Best,
>> > > > > > >> > > > Lincoln Lee
>> > > > > > >> > > >
>> > > > > > >> > > >
>> > > > > > >> > > > Jing Ge  于2023年10月27日周五
>> > > > > 00:16写道:
>> > > > > > >> > > >
>> > > > > > >> > > >> The Apache Flink community is very happy to
>> announce the
>> > > > > > release of
>> > > > > > >> > > > Apache
>> > > > > > >> > > >> Flink 1.18.0, which is the first release for the
>> Apache
>> > > > > > Flink 1.18
>> > > > > > >> > > > series.
>> > > > > > >> > > >>
>> > > > > > >> > > >> Apache Flink® is an open-source unified stream and
>> batch
>> > > > > data
>> > > > > > >> > >  processing
>> > > > > > >> > > >> framework for distributed, high-performing,
>> > > > > > always-available, and
>> > > > > > >> > > > accurate
>> > > > > > >> > > >> data applications.
>> > > > > > >> > > >>
>> > > > > > >> > > >> The release is available for download at:
>> > > > > > >> > > >> https://flink.apache.org/downloads.html
>> > > > > > >> > > >>
>> > > > > > >> > > >> Please check out the release blog post for an
>> overview of
>> > > > > the
>> > > > > > >> > > > improvements
>> > > > > > >> > > >> for this release:
>> > > > > > >> > > >>
>> > > > > > >> > > >>
>> > > > > > >> > > >
>> > > > > > >> > > 
>> > > > > > >> > > >>>
>> > > > > > >> > > >>
>> > > > > > >> > >
>> > > > > >
>> > > > >
>> > > >
>> https://flink.apache.org/2023/10/24/

[jira] [Created] (FLINK-33378) Bump flink version on flink-connectors-jdbc

2023-10-27 Thread Jira
João Boto created FLINK-33378:
-

 Summary: Bump flink version on flink-connectors-jdbc
 Key: FLINK-33378
 URL: https://issues.apache.org/jira/browse/FLINK-33378
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / JDBC
Reporter: João Boto


With the release of Flink 1.18, bump flink version on connector 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33379) Bump flink version on flink-connectors-elasticsearch

2023-10-27 Thread Yubin Li (Jira)
Yubin Li created FLINK-33379:


 Summary: Bump flink version on flink-connectors-elasticsearch
 Key: FLINK-33379
 URL: https://issues.apache.org/jira/browse/FLINK-33379
 Project: Flink
  Issue Type: Improvement
Reporter: Yubin Li


As Flink 1.18 released, bump the flink version in es connector .



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] FLIP-376: Add DISTRIBUTED BY clause

2023-10-27 Thread Jark Wu
Hi Timo,

Thanks for starting this discussion. I really like it!
The FLIP is already in good shape, I only have some minor comments.

1. Could we also support HASH and RANGE distribution kind on the DDL
syntax?
I noticed that HASH and UNKNOWN are introduced in the Java API, but not in
the syntax.

2. Can we make "INTO n BUCKETS" optional in CREATE TABLE and ALTER TABLE?
Some storage engines support automatically determining the bucket number
based on
the cluster resources and data size of the table. For example, StarRocks[1]
and Paimon[2].

Best,
Jark

[1]:
https://docs.starrocks.io/en-us/latest/table_design/Data_distribution#determine-the-number-of-buckets
[2]:
https://paimon.apache.org/docs/0.5/concepts/primary-key-table/#dynamic-bucket

On Thu, 26 Oct 2023 at 18:26, Jingsong Li  wrote:

> Very thanks Timo for starting this discussion.
>
> Big +1 for this.
>
> The design looks good to me!
>
> We can add some documentation for connector developers. For example:
> for sink, If there needs some keyby, please finish the keyby by the
> connector itself. SupportsBucketing is just a marker interface.
>
> Best,
> Jingsong
>
> On Thu, Oct 26, 2023 at 5:00 PM Timo Walther  wrote:
> >
> > Hi everyone,
> >
> > I would like to start a discussion on FLIP-376: Add DISTRIBUTED BY
> > clause [1].
> >
> > Many SQL vendors expose the concepts of Partitioning, Bucketing, and
> > Clustering. This FLIP continues the work of previous FLIPs and would
> > like to introduce the concept of "Bucketing" to Flink.
> >
> > This is a pure connector characteristic and helps both Apache Kafka and
> > Apache Paimon connectors in avoiding a complex WITH clause by providing
> > improved syntax.
> >
> > Here is an example:
> >
> > CREATE TABLE MyTable
> >(
> >  uid BIGINT,
> >  name STRING
> >)
> >DISTRIBUTED BY (uid) INTO 6 BUCKETS
> >WITH (
> >  'connector' = 'kafka'
> >)
> >
> > The full syntax specification can be found in the document. The clause
> > should be optional and fully backwards compatible.
> >
> > Regards,
> > Timo
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-376%3A+Add+DISTRIBUTED+BY+clause
>


[jira] [Created] (FLINK-33380) Bump flink version on flink-connectors-mongodb

2023-10-27 Thread Jiabao Sun (Jira)
Jiabao Sun created FLINK-33380:
--

 Summary: Bump flink version on flink-connectors-mongodb
 Key: FLINK-33380
 URL: https://issues.apache.org/jira/browse/FLINK-33380
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / MongoDB
Affects Versions: mongodb-1.0.2
Reporter: Jiabao Sun
 Fix For: mongodb-1.1.0


As Flink 1.18 released, bump the flink version in mongodb connector .



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33381) Support split big parquet file to multi InputSplits

2023-10-27 Thread yunfan (Jira)
yunfan created FLINK-33381:
--

 Summary: Support split big parquet file to multi InputSplits
 Key: FLINK-33381
 URL: https://issues.apache.org/jira/browse/FLINK-33381
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / Hive
Reporter: yunfan


Currently, Flink only supports dividing a Parquet file into a split.
But in some cases, one parquet file is too big for one task.
In this case, we need to split one parquet file to multi splits.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Planning Flink 1.19

2023-10-27 Thread Matthias Pohl
+1 from my side for Lincoln, Yun Tang, Jing and Martijn as release managers.
Thanks everyone for volunteering.

I tried to collect the different tasks that are part of release management
in [1]. It might help to identify responsibilities. Feel free to have a
look and/or update it. Ideally, it will help others to decide whether they
feel ready to contribute to the community as release managers in future
releases.

[1]
https://cwiki.apache.org/confluence/display/FLINK/Flink+Release+Management


On Thu, Oct 26, 2023 at 9:15 PM Martijn Visser 
wrote:

> Hi Lincoln and Yun,
>
> Happy to jump on board as release manager too :)
>
> Best regards,
>
> Martijn
>
> Op do 26 okt 2023 om 20:50 schreef Jing Ge 
>
> > Hi Lincoln,
> >
> > Thanks for kicking off 1.19! I got a lot of experience as a release
> manager
> > for the 1.18 release. I would like to join you and participate in the
> 1.19
> > release cycle.
> >
> > Best regards,
> > Jing
> >
> > On Thu, Oct 26, 2023 at 6:27 PM Lincoln Lee 
> > wrote:
> >
> > > Hi everyone,
> > >
> > > With the release announcement of Flink 1.18, it’s a good time to kick
> off
> > > discussion of the next release 1.19.
> > >
> > > - Release managers
> > >
> > > Yun Tang and I would like to volunteer as release managers for 1.19,
> and
> > > it would be great to have someone else working together on this
> release.
> > > Please let us know if you have any interest!
> > >
> > > - Timeline
> > >
> > > Flink 1.18 has been released. With a target release cycle of 4 months,
> we
> > > propose a feature freeze date of *Jan 26, 2024*.
> > >
> > > - Collecting features
> > >
> > > As usual, we've created a wiki page[1] for collecting new features in
> > 1.19.
> > > In addition, we already have a number of FLIPs that have been voted or
> > are
> > > in
> > > the process, including pre-works for version 2.0.
> > > In the meantime, the release management team will be finalized in the
> > next
> > > few days,
> > > and we'll continue to create Jira Boards and Sync meetings to make it
> > easy
> > > for
> > > everyone to get an overview and track progress.
> > >
> > > 1. https://cwiki.apache.org/confluence/display/FLINK/1.19+Release
> > >
> > > Best,
> > > Lincoln Lee
> > >
> >
>


Re: [ANNOUNCE] Apache Flink 1.18.0 released

2023-10-27 Thread Matthias Pohl
Thanks to everyone who was involved and especially to the 1.18 release
managers. :)

On Fri, Oct 27, 2023 at 9:13 AM Yuepeng Pan  wrote:

> Thanks for the great work! Congratulations to everyone involved!
>
>
> Best,
> Yuepeng Pan
>
> At 2023-10-27 15:06:40, "ConradJam"  wrote:
> >Congratulations!
> >
> >Jingsong Li  于2023年10月27日周五 13:55写道:
> >
> >> Congratulations!
> >>
> >> Thanks Jing and other release managers and all contributors.
> >>
> >> Best,
> >> Jingsong
> >>
> >> On Fri, Oct 27, 2023 at 1:52 PM Zakelly Lan 
> wrote:
> >> >
> >> > Congratulations and thank you all!
> >> >
> >> >
> >> > Best,
> >> > Zakelly
> >> >
> >> > On Fri, Oct 27, 2023 at 12:39 PM Jark Wu  wrote:
> >> > >
> >> > > Congratulations and thanks release managers and everyone who has
> >> > > contributed!
> >> > >
> >> > > Best,
> >> > > Jark
> >> > >
> >> > > On Fri, 27 Oct 2023 at 12:25, Hang Ruan 
> >> wrote:
> >> > >
> >> > > > Congratulations!
> >> > > >
> >> > > > Best,
> >> > > > Hang
> >> > > >
> >> > > > Samrat Deb  于2023年10月27日周五 11:50写道:
> >> > > >
> >> > > > > Congratulations on the great release
> >> > > > >
> >> > > > > Bests,
> >> > > > > Samrat
> >> > > > >
> >> > > > > On Fri, 27 Oct 2023 at 7:59 AM, Yangze Guo 
> >> wrote:
> >> > > > >
> >> > > > > > Great work! Congratulations to everyone involved!
> >> > > > > >
> >> > > > > > Best,
> >> > > > > > Yangze Guo
> >> > > > > >
> >> > > > > > On Fri, Oct 27, 2023 at 10:23 AM Qingsheng Ren <
> re...@apache.org
> >> >
> >> > > > wrote:
> >> > > > > > >
> >> > > > > > > Congratulations and big THANK YOU to everyone helping with
> this
> >> > > > > release!
> >> > > > > > >
> >> > > > > > > Best,
> >> > > > > > > Qingsheng
> >> > > > > > >
> >> > > > > > > On Fri, Oct 27, 2023 at 10:18 AM Benchao Li <
> >> libenc...@apache.org>
> >> > > > > > wrote:
> >> > > > > > >>
> >> > > > > > >> Great work, thanks everyone involved!
> >> > > > > > >>
> >> > > > > > >> Rui Fan <1996fan...@gmail.com> 于2023年10月27日周五 10:16写道:
> >> > > > > > >> >
> >> > > > > > >> > Thanks for the great work!
> >> > > > > > >> >
> >> > > > > > >> > Best,
> >> > > > > > >> > Rui
> >> > > > > > >> >
> >> > > > > > >> > On Fri, Oct 27, 2023 at 10:03 AM Paul Lam <
> >> paullin3...@gmail.com>
> >> > > > > > wrote:
> >> > > > > > >> >
> >> > > > > > >> > > Finally! Thanks to all!
> >> > > > > > >> > >
> >> > > > > > >> > > Best,
> >> > > > > > >> > > Paul Lam
> >> > > > > > >> > >
> >> > > > > > >> > > > 2023年10月27日 03:58,Alexander Fedulov <
> >> > > > > alexander.fedu...@gmail.com>
> >> > > > > > 写道:
> >> > > > > > >> > > >
> >> > > > > > >> > > > Great work, thanks everyone!
> >> > > > > > >> > > >
> >> > > > > > >> > > > Best,
> >> > > > > > >> > > > Alexander
> >> > > > > > >> > > >
> >> > > > > > >> > > > On Thu, 26 Oct 2023 at 21:15, Martijn Visser <
> >> > > > > > martijnvis...@apache.org>
> >> > > > > > >> > > > wrote:
> >> > > > > > >> > > >
> >> > > > > > >> > > >> Thank you all who have contributed!
> >> > > > > > >> > > >>
> >> > > > > > >> > > >> Op do 26 okt 2023 om 18:41 schreef Feng Jin <
> >> > > > > > jinfeng1...@gmail.com>
> >> > > > > > >> > > >>
> >> > > > > > >> > > >>> Thanks for the great work! Congratulations
> >> > > > > > >> > > >>>
> >> > > > > > >> > > >>>
> >> > > > > > >> > > >>> Best,
> >> > > > > > >> > > >>> Feng Jin
> >> > > > > > >> > > >>>
> >> > > > > > >> > > >>> On Fri, Oct 27, 2023 at 12:36 AM Leonard Xu <
> >> > > > > xbjt...@gmail.com>
> >> > > > > > wrote:
> >> > > > > > >> > > >>>
> >> > > > > > >> > >  Congratulations, Well done!
> >> > > > > > >> > > 
> >> > > > > > >> > >  Best,
> >> > > > > > >> > >  Leonard
> >> > > > > > >> > > 
> >> > > > > > >> > >  On Fri, Oct 27, 2023 at 12:23 AM Lincoln Lee <
> >> > > > > > lincoln.8...@gmail.com>
> >> > > > > > >> > >  wrote:
> >> > > > > > >> > > 
> >> > > > > > >> > > > Thanks for the great work! Congrats all!
> >> > > > > > >> > > >
> >> > > > > > >> > > > Best,
> >> > > > > > >> > > > Lincoln Lee
> >> > > > > > >> > > >
> >> > > > > > >> > > >
> >> > > > > > >> > > > Jing Ge 
> 于2023年10月27日周五
> >> > > > > 00:16写道:
> >> > > > > > >> > > >
> >> > > > > > >> > > >> The Apache Flink community is very happy to
> >> announce the
> >> > > > > > release of
> >> > > > > > >> > > > Apache
> >> > > > > > >> > > >> Flink 1.18.0, which is the first release for the
> >> Apache
> >> > > > > > Flink 1.18
> >> > > > > > >> > > > series.
> >> > > > > > >> > > >>
> >> > > > > > >> > > >> Apache Flink® is an open-source unified stream
> and
> >> batch
> >> > > > > data
> >> > > > > > >> > >  processing
> >> > > > > > >> > > >> framework for distributed, high-performing,
> >> > > > > > always-available, and
> >> > > > > > >> > > > accurate
> >> > > > > > >> > > >> data applications.
> >> > > > > > >> > > >>
> >> > > > > > >> > > >> The release is available for download at:
> >> > > > > > >> > > >> https://flink.apache.or

[jira] [Created] (FLINK-33382) Flink Python Environment Manager Fails with Pip --install-option in Recent Pip Versions

2023-10-27 Thread Christos Hadjinikolis (Jira)
Christos Hadjinikolis created FLINK-33382:
-

 Summary: Flink Python Environment Manager Fails with Pip 
--install-option in Recent Pip Versions
 Key: FLINK-33382
 URL: https://issues.apache.org/jira/browse/FLINK-33382
 Project: Flink
  Issue Type: Bug
Affects Versions: 1.17.1
 Environment: * Flink version: 1.17.1
 * Python version: 3.10
 * Pip version: >= 21
 * Operating System: [MacOS]
Reporter: Christos Hadjinikolis


I encountered an issue when running Flink jobs that use Python dependencies. 
The underlying problem seems to stem from Flink's use of the 
{{--install-option}} argument when calling {{pip install}} to manage Python 
dependencies. This argument has been deprecated in newer versions of pip, 
resulting in job failures.

This issue forces users to downgrade their {{pip}} version as a temporary 
workaround, which is not ideal due to potential security vulnerabilities and 
missing features in older {{pip}} versions.



{*}Steps to Reproduce{*}:
 # Setup a Flink job with Python dependencies.
 # Use a Python environment with a {{pip}} version >= 21.
 # Run the Flink job.
 # Observe the error: {{{}no such option: --install-option{}}}.

{*}Error Logs{*}:

 
pythonCopy code
{{no such option: --install-option...}}
(You can add more logs here if needed)

{*}Expected Behavior{*}: Flink should handle Python dependency management in a 
way that's compatible with newer versions of {{{}pip{}}}.



{*}Possible Solutions{*}:
 # Update Flink's Python dependency management code to remove or replace the 
{{--install-option}} argument.
 # Provide Flink configuration options to customize the {{pip install}} command 
or to skip certain options.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


RE: flink-sql-connector-jdbc new release

2023-10-27 Thread David Radley
Hi Jing,
thanks are there any processes documented around getting a release out. Out of 
interest what is your thinking around this being a blocker? I suspect it is not 
a regression, but a really nice to have, WDYT,
Either way it looks interesting – I am going to have a look into this issue to 
try to move it along– could you assign it to me please,
Kind regards,  David.

From: Jingsong Li 
Date: Friday, 27 October 2023 at 06:54
To: dev@flink.apache.org 
Subject: [EXTERNAL] Re: flink-sql-connector-jdbc new release
Hi David,

Thanks for driving this.

I think https://issues.apache.org/jira/browse/FLINK-33365  should be a blocker.

Best,
Jingsong

On Thu, Oct 26, 2023 at 11:43 PM David Radley  wrote:
>
> Hi,
> I propose that we do a 3.2 release of flink-sql-connector-jdbc so that there 
> is a version matching 1.18 that includes the new dialects. I am happy to 
> drive this, some pointers to documentation on the process and the approach to 
> testing the various dialects would be great,
>
>  Kind regards, David.
>
>
> Unless otherwise stated above:
>
> IBM United Kingdom Limited
> Registered in England and Wales with number 741598
> Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU


Re: [DISCUSS] FLIP-377: Support configuration to disable filter push down for Table/SQL Sources

2023-10-27 Thread Jark Wu
Hi Becket,

I checked the history of "
*table.optimizer.source.predicate-pushdown-enabled*",
it seems it was introduced since the legacy FilterableTableSource interface
which might be an experiential feature at that time. I don't see the
necessity
of this option at the moment. Maybe we can deprecate this option and drop
it
in Flink 2.0[1] if it is not necessary anymore. This may help to
simplify this discussion.


Best,
Jark

[1]: https://issues.apache.org/jira/browse/FLINK-32383



On Thu, 26 Oct 2023 at 10:14, Becket Qin  wrote:

> Thanks for the proposal, Jiabao. My two cents below:
>
> 1. If I understand correctly, the motivation of the FLIP is mainly to make
> predicate pushdown optional on SOME of the Sources. If so, intuitively the
> configuration should be Source specific instead of general. Otherwise, we
> will end up with general configurations that may not take effect for some
> of the Source implementations. This violates the basic rule of a
> configuration - it does what it says, regardless of the implementation.
> While configuration standardization is usually a good thing, it should not
> break the basic rules.
> If we really want to have this general configuration, for the sources this
> configuration does not apply, they should throw an exception to make it
> clear that this configuration is not supported. However, that seems ugly.
>
> 2. I think the actual motivation of this FLIP is about "how a source
> should implement predicate pushdown efficiently", not "whether predicate
> pushdown should be applied to the source." For example, if a source wants
> to avoid additional computing load in the external system, it can always
> read the entire record and apply the predicates by itself. However, from
> the Flink perspective, the predicate pushdown is applied, it is just
> implemented differently by the source. So the design principle here is that
> Flink only cares about whether a source supports predicate pushdown or not,
> it does not care about the implementation efficiency / side effect of the
> predicates pushdown. It is the Source implementation's responsibility to
> ensure the predicates pushdown is implemented efficiently and does not
> impose excessive pressure on the external system. And it is OK to have
> additional configurations to achieve this goal. Obviously, such
> configurations will be source specific in this case.
>
> 3. Regarding the existing configurations of 
> *table.optimizer.source.predicate-pushdown-enabled.
> *I am not sure why we need it. Supposedly, if a source implements a
> SupportsXXXPushDown interface, the optimizer should push the corresponding
> predicates to the Source. I am not sure in which case this configuration
> would be used. Any ideas @Jark Wu ?
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
>
> On Wed, Oct 25, 2023 at 11:55 PM Jiabao Sun
>  wrote:
>
>> Thanks Jane for the detailed explanation.
>>
>> I think that for users, we should respect conventions over
>> configurations.
>> Conventions can be default values explicitly specified in configurations,
>> or they can be behaviors that follow previous versions.
>> If the same code has different behaviors in different versions, it would
>> be a very bad thing.
>>
>> I agree that for regular users, it is not necessary to understand all the
>> configurations related to Flink.
>> By following conventions, they can have a good experience.
>>
>> Let's get back to the practical situation and consider it.
>>
>> Case 1:
>> The user is not familiar with the purpose of the
>> table.optimizer.source.predicate-pushdown-enabled configuration but follows
>> the convention of allowing predicate pushdown to the source by default.
>> Just understanding the source.predicate-pushdown-enabled configuration
>> and performing fine-grained toggle control will work well.
>>
>> Case 2:
>> The user understands the meaning of the
>> table.optimizer.source.predicate-pushdown-enabled configuration and has set
>> its value to false.
>> We have reason to believe that the user understands the meaning of the
>> predicate pushdown configuration and the intention is to disable predicate
>> pushdown (rather than whether or not to allow it).
>> The previous choice of globally disabling it is likely because it
>> couldn't be disabled on individual sources.
>> From this perspective, if we provide more fine-grained configuration
>> support and provide detailed explanations of the configuration behaviors in
>> the documentation,
>> users can clearly understand the differences between these two
>> configurations and use them correctly.
>>
>> Also, I don't agree that
>> table.optimizer.source.predicate-pushdown-enabled = true and
>> source.predicate-pushdown-enabled = false means that the local
>> configuration overrides the global configuration.
>> On the contrary, both configurations are functioning correctly.
>> The optimizer allows predicate pushdown to all sources, but some sources
>> can reject the filters pushed down by the optimizer.
>> This

RE: flink-sql-connector-jdbc new release

2023-10-27 Thread David Radley
Hi Jing,
I just spotted the mailing list that it is a regression – I agree it is a 
blocker,
   Kind regards, David.

From: David Radley 
Date: Friday, 27 October 2023 at 10:33
To: dev@flink.apache.org 
Subject: [EXTERNAL] RE: flink-sql-connector-jdbc new release
Hi Jing,
thanks are there any processes documented around getting a release out. Out of 
interest what is your thinking around this being a blocker? I suspect it is not 
a regression, but a really nice to have, WDYT,
Either way it looks interesting – I am going to have a look into this issue to 
try to move it along– could you assign it to me please,
Kind regards,  David.

From: Jingsong Li 
Date: Friday, 27 October 2023 at 06:54
To: dev@flink.apache.org 
Subject: [EXTERNAL] Re: flink-sql-connector-jdbc new release
Hi David,

Thanks for driving this.

I think https://issues.apache.org/jira/browse/FLINK-33365   should be a blocker.

Best,
Jingsong

On Thu, Oct 26, 2023 at 11:43 PM David Radley  wrote:
>
> Hi,
> I propose that we do a 3.2 release of flink-sql-connector-jdbc so that there 
> is a version matching 1.18 that includes the new dialects. I am happy to 
> drive this, some pointers to documentation on the process and the approach to 
> testing the various dialects would be great,
>
>  Kind regards, David.
>
>
> Unless otherwise stated above:
>
> IBM United Kingdom Limited
> Registered in England and Wales with number 741598
> Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU


RE: Re: [DISCUSS] FLIP-376: Add DISTRIBUTED BY clause

2023-10-27 Thread yunfan zhang
Distribute by in DML is also supported by Hive.
And it is also useful for flink.
Users can use this ability to increase cache hit rate in lookup join.
And users can use "distribute by key, rand(1, 10)” to avoid data skew problem.
And I think it is another way to solve this Flip204[1]
There is already has some people required this feature[2]

[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-204%3A+Introduce+Hash+Lookup+Join
[2] https://issues.apache.org/jira/browse/FLINK-27541

On 2023/10/27 08:20:25 Jark Wu wrote:
> Hi Timo,
> 
> Thanks for starting this discussion. I really like it!
> The FLIP is already in good shape, I only have some minor comments.
> 
> 1. Could we also support HASH and RANGE distribution kind on the DDL
> syntax?
> I noticed that HASH and UNKNOWN are introduced in the Java API, but not in
> the syntax.
> 
> 2. Can we make "INTO n BUCKETS" optional in CREATE TABLE and ALTER TABLE?
> Some storage engines support automatically determining the bucket number
> based on
> the cluster resources and data size of the table. For example, StarRocks[1]
> and Paimon[2].
> 
> Best,
> Jark
> 
> [1]:
> https://docs.starrocks.io/en-us/latest/table_design/Data_distribution#determine-the-number-of-buckets
> [2]:
> https://paimon.apache.org/docs/0.5/concepts/primary-key-table/#dynamic-bucket
> 
> On Thu, 26 Oct 2023 at 18:26, Jingsong Li  wrote:
> 
> > Very thanks Timo for starting this discussion.
> >
> > Big +1 for this.
> >
> > The design looks good to me!
> >
> > We can add some documentation for connector developers. For example:
> > for sink, If there needs some keyby, please finish the keyby by the
> > connector itself. SupportsBucketing is just a marker interface.
> >
> > Best,
> > Jingsong
> >
> > On Thu, Oct 26, 2023 at 5:00 PM Timo Walther  wrote:
> > >
> > > Hi everyone,
> > >
> > > I would like to start a discussion on FLIP-376: Add DISTRIBUTED BY
> > > clause [1].
> > >
> > > Many SQL vendors expose the concepts of Partitioning, Bucketing, and
> > > Clustering. This FLIP continues the work of previous FLIPs and would
> > > like to introduce the concept of "Bucketing" to Flink.
> > >
> > > This is a pure connector characteristic and helps both Apache Kafka and
> > > Apache Paimon connectors in avoiding a complex WITH clause by providing
> > > improved syntax.
> > >
> > > Here is an example:
> > >
> > > CREATE TABLE MyTable
> > >(
> > >  uid BIGINT,
> > >  name STRING
> > >)
> > >DISTRIBUTED BY (uid) INTO 6 BUCKETS
> > >WITH (
> > >  'connector' = 'kafka'
> > >)
> > >
> > > The full syntax specification can be found in the document. The clause
> > > should be optional and fully backwards compatible.
> > >
> > > Regards,
> > > Timo
> > >
> > > [1]
> > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-376%3A+Add+DISTRIBUTED+BY+clause
> >
> 

python Table API problem with pandas

2023-10-27 Thread Alexey Sergeev
Hi everyone,

Python Table API seems to be a little bit buggy.

Some minimal examples of strange behaviors here:

https://gist.github.com/nrdhm/88322a68fc3e9a14a5f4ab6ec13403cf



Was testing in pyflink-shell in our small cluster with Flink 1.17.

Docker image: flink:1.17.1-scala_2.12-java11



The third problem with pandas UDF concerns me the most.



It seems like Vectorized UDF do not work at all with .filter() /.where()
calls.

Columns name are reset to default f0, f1, …, fN and values are not being
filtered.

And so, I have some questions:
  1. Was you able to reproduce these problems?
  2. Is it the expected behavior?
  3. How can we get around this?

Best regards, Alexey


Re: flink-sql-connector-jdbc new release

2023-10-27 Thread Martijn Visser
Hi David,

The release process for connector is documented at
https://cwiki.apache.org/confluence/display/FLINK/Creating+a+flink-connector+release

Best regards,

Martijn

On Fri, Oct 27, 2023 at 12:00 PM David Radley  wrote:
>
> Hi Jing,
> I just spotted the mailing list that it is a regression – I agree it is a 
> blocker,
>Kind regards, David.
>
> From: David Radley 
> Date: Friday, 27 October 2023 at 10:33
> To: dev@flink.apache.org 
> Subject: [EXTERNAL] RE: flink-sql-connector-jdbc new release
> Hi Jing,
> thanks are there any processes documented around getting a release out. Out 
> of interest what is your thinking around this being a blocker? I suspect it 
> is not a regression, but a really nice to have, WDYT,
> Either way it looks interesting – I am going to have a look into this issue 
> to try to move it along– could you assign it to me please,
> Kind regards,  David.
>
> From: Jingsong Li 
> Date: Friday, 27 October 2023 at 06:54
> To: dev@flink.apache.org 
> Subject: [EXTERNAL] Re: flink-sql-connector-jdbc new release
> Hi David,
>
> Thanks for driving this.
>
> I think https://issues.apache.org/jira/browse/FLINK-33365   should be a 
> blocker.
>
> Best,
> Jingsong
>
> On Thu, Oct 26, 2023 at 11:43 PM David Radley  wrote:
> >
> > Hi,
> > I propose that we do a 3.2 release of flink-sql-connector-jdbc so that 
> > there is a version matching 1.18 that includes the new dialects. I am happy 
> > to drive this, some pointers to documentation on the process and the 
> > approach to testing the various dialects would be great,
> >
> >  Kind regards, David.
> >
> >
> > Unless otherwise stated above:
> >
> > IBM United Kingdom Limited
> > Registered in England and Wales with number 741598
> > Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU
>
> Unless otherwise stated above:
>
> IBM United Kingdom Limited
> Registered in England and Wales with number 741598
> Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU
>
> Unless otherwise stated above:
>
> IBM United Kingdom Limited
> Registered in England and Wales with number 741598
> Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU


Re: python Table API problem with pandas

2023-10-27 Thread Alexey Sergeev
# problem_2.py

# .alias() does not work either

import json

t_env = TableEnvironment.create(EnvironmentSettings.in_streaming_mode())

table = t_env.from_elements(
elements=[
(1, '{"name": "Flink"}'),
(2, '{"name": "hello"}'),
(3, '{"name": "world"}'),
(4, '{"name": "PyFlink"}')
],
schema=['id', 'data'],
).alias('id', 'data')

@udf(
result_type=(
'Row'
),
)
def example_map(row: Row):
print('\n'*3, f'{row=}', '\n'*3)
# will print:
# row=Row(f0=1, f1='{"name": "Flink"}')
# expected:
# row=Row(id=1, data='{"name": "Flink"}')
data = json.loads(row.data)
return Row(row.id, data['name'])

# Will raise with
# ValueError: 'data' is not in list

flow = (
table
.map(example_map)
.execute().print()
)

On Fri, Oct 27, 2023 at 2:14 PM Alexey Sergeev  wrote:
>
> Hi everyone,
>
>
> Python Table API seems to be a little bit buggy.
>
> Some minimal examples of strange behaviors here:
>
> https://gist.github.com/nrdhm/88322a68fc3e9a14a5f4ab6ec13403cf
>
>
>
> Was testing in pyflink-shell in our small cluster with Flink 1.17.
>
> Docker image: flink:1.17.1-scala_2.12-java11
>
>
>
> The third problem with pandas UDF concerns me the most.
>
>
>
> It seems like Vectorized UDF do not work at all with .filter() /.where() 
> calls.
>
> Columns name are reset to default f0, f1, …, fN and values are not being 
> filtered.
>
>
> And so, I have some questions:
>
>   1. Was you able to reproduce these problems?
>   2. Is it the expected behavior?
>   3. How can we get around this?
>
> Best regards, Alexey


Re: python Table API problem with pandas

2023-10-27 Thread Alexey Sergeev
I'll copy the problems here if your prefer that.

# problem_1.py

# add_columns() resets column names to default names f0, f1, ..., fN

t_env = TableEnvironment.create(EnvironmentSettings.in_streaming_mode())

table = t_env.from_elements(
elements=[
(1, '{"name": "Flink"}'),
(2, '{"name": "hello"}'),
(3, '{"name": "world"}'),
(4, '{"name": "PyFlink"}')
],
schema=['id', 'data'],
).add_columns(
col('data').json_value('$.name', DataTypes.STRING()).alias('name'),
)

@udf(
result_type=(
'Row'
),
)
def example_map(row: Row):
print('\n'*3, f'{row=}', '\n'*3)
# will print:
# row=Row(id=1, data='{"name": "Flink"}', f0='Flink')
return Row(row.id, row.name)

# Will raise with
# ValueError: 'name' is not in list

flow = (
table
.map(example_map)
.execute().print()
)


On Fri, Oct 27, 2023 at 2:14 PM Alexey Sergeev  wrote:
>
> Hi everyone,
>
>
> Python Table API seems to be a little bit buggy.
>
> Some minimal examples of strange behaviors here:
>
> https://gist.github.com/nrdhm/88322a68fc3e9a14a5f4ab6ec13403cf
>
>
>
> Was testing in pyflink-shell in our small cluster with Flink 1.17.
>
> Docker image: flink:1.17.1-scala_2.12-java11
>
>
>
> The third problem with pandas UDF concerns me the most.
>
>
>
> It seems like Vectorized UDF do not work at all with .filter() /.where() 
> calls.
>
> Columns name are reset to default f0, f1, …, fN and values are not being 
> filtered.
>
>
> And so, I have some questions:
>
>   1. Was you able to reproduce these problems?
>   2. Is it the expected behavior?
>   3. How can we get around this?
>
> Best regards, Alexey


Re: python Table API problem with pandas

2023-10-27 Thread Alexey Sergeev
# problem_3.py

# call to .where() after .map() with pandas type function
# also resets column names
# and doesn't really filter values

import pandas as pd

t_env = TableEnvironment.create(EnvironmentSettings.in_streaming_mode())

table = t_env.from_elements(
elements=[
(1, 'China'),
(2, 'Germany'),
(3, 'China'),
],
schema=['id', 'country'],
)

@udf(
result_type=(
'Row'
),
func_type="pandas",
)
def example_map_a(df: pd.DataFrame):
columns = sorted(df.columns)
print(f'example_map_a: {columns=}')
# prints:
# example_map_a: columns=['country', 'id']
assert columns == ['country', 'id'], columns
return df


@udf(
result_type=(
'Row'
),
func_type="pandas",
)
def example_map_b(df: pd.DataFrame):
columns = sorted(df.columns)
print(f'example_map_b: {columns=}')
# Prints:
# example_map_b: columns=['f0', 'f1']

print(f'example_map_b df: {df=}')
# Prints:
# df=   f0   f1
# 0   1China
# 1   2  Germany
# 2   3China
# Although China was expected to be filtered out.

# Raises:
# AssertionError: ['f0', 'f1']
assert columns == ['country', 'id'], columns
return df

# Will raise with
# AssertionError: ['f0', 'f1']

flow = (
table
.map(example_map_a)
.where(col('country') == 'Germany')
.map(example_map_b)
.execute().print()
)


[jira] [Created] (FLINK-33383) flink-quickstart-scala is not supported anymore since 1.17

2023-10-27 Thread Matthias Pohl (Jira)
Matthias Pohl created FLINK-33383:
-

 Summary: flink-quickstart-scala is not supported anymore since 1.17
 Key: FLINK-33383
 URL: https://issues.apache.org/jira/browse/FLINK-33383
 Project: Flink
  Issue Type: Bug
  Components: Project Website
Affects Versions: 1.17.1, 1.18.0, 1.19.0
Reporter: Matthias Pohl


The quickstart scripts for scala are not supported anymore since Flink 1.17.

We might want to remove the scripts. Their execution would fail due to missing 
Maven artifacts for 1.17+:
{code:bash}
[WARNING] The POM for org.apache.flink:flink-quickstart-scala:jar:1.17.0 is 
missing, no dependency information available
Downloading from central: 
https://repo.maven.apache.org/maven2/org/apache/flink/flink-quickstart-scala/1.17.0/flink-quickstart-scala-1.17.0.jar
[INFO] 
[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time:  11.050 s
[INFO] Finished at: 2023-10-27T14:13:33+02:00
[INFO] 
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-archetype-plugin:3.2.1:generate (default-cli) on 
project standalone-pom: The desired archetype does not exist 
(org.apache.flink:flink-quickstart-scala:1.17.0) -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[DISCUSS] Confluent Avro support without Schema Registry access

2023-10-27 Thread Dale Lane
TLDR:
We currently require a connection to a Confluent Schema Registry to be able to 
work with Confluent Avro data. With a small modification to the Avro formatter, 
I think we could also offer the ability to process this type of data without 
requiring access to the schema registry.

What would people think of such an enhancement?

-

When working with Avro data, there are two formats available to us: avro and 
avro-confluent.

avro
Data it supports: Avro records
Approach: You specify a table schema and it derives an appropriate Avro schema 
from this.

avro-confluent
Data it supports: Confluent’s variant[1] of the Avro encoding
Approach: You provide connection details (URL, credentials, 
keystore/truststore, schema lookup strategy, etc.) for retrieving an 
appropriate schema from the Confluent Schema Registry.

What this means is if you have Confluent Avro data[2] that you want to use in 
Flink, you currently have to use the avro-confluent format, and that means you 
need to provide Flink with access to your Schema Registry.

I think there will be times where you may not want, or may not be able, to 
provide Flink with direct access to a Schema Registry. In such cases, it would 
be useful to support the same behaviour that the avro format does (i.e. allow 
you to explicitly specify a table schema)

This could be achieved with a very minor modification to the avro formatter.

For reading records, we could add an option to the formatter to highlight when 
records will be Confluent Avro. If that option is set, we just need the 
formatter to skip the first bytes with the schema ID/version (it can then use 
the remaining bytes with a regular Avro decoder as it does today – the existing 
implementation would be essentially unchanged).

For writing records, something similar would work. An option to the formatter 
to highlight when to write records using Confluent Avro. We would need a way to 
specify what ID value to use for the first bytes [3]. (After that, the record 
can be encoded with a regular Avro encoder as it does today – the rest of the 
implementation would be unchanged).


-
[1] – This is the same as regular Avro, but prefixing the payload with extra 
bytes that identify which schema to use, to allow an appropriate schema to be 
retrieved from a schema registry.

[2] – Records that were serialized by 
io.confluent.kafka.serializers.KafkaAvroSerializer and could be read by 
io.confluent.kafka.serializers.KafkaAvroDeserializer.

[3] – Either by making them fixed options for that formatter, or by allowing it 
to be specified from something in the record.

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU


Re: [DISCUSS] Confluent Avro support without Schema Registry access

2023-10-27 Thread Martijn Visser
Hi Dale,

I'm struggling to understand in what cases you want to read data
serialized in connection with Confluent Schema Registry, but can't get
access to the Schema Registry service. It seems like a rather exotic
situation and it beats the purposes of using a Schema Registry in the
first place? I also doubt that it's actually really useful: if you
strip the magic byte, and the schema has evolved when you're consuming
it from Flink, you can end up with deserialization errors given that a
field might have been deleted/added/changed etc. Also, it wouldn't
work when you actually want to write avro-confluent, because that
requires a check when producing if you're still being compliant.

Best regards,

Martijn

On Fri, Oct 27, 2023 at 2:53 PM Dale Lane  wrote:
>
> TLDR:
> We currently require a connection to a Confluent Schema Registry to be able 
> to work with Confluent Avro data. With a small modification to the Avro 
> formatter, I think we could also offer the ability to process this type of 
> data without requiring access to the schema registry.
>
> What would people think of such an enhancement?
>
> -
>
> When working with Avro data, there are two formats available to us: avro and 
> avro-confluent.
>
> avro
> Data it supports: Avro records
> Approach: You specify a table schema and it derives an appropriate Avro 
> schema from this.
>
> avro-confluent
> Data it supports: Confluent’s variant[1] of the Avro encoding
> Approach: You provide connection details (URL, credentials, 
> keystore/truststore, schema lookup strategy, etc.) for retrieving an 
> appropriate schema from the Confluent Schema Registry.
>
> What this means is if you have Confluent Avro data[2] that you want to use in 
> Flink, you currently have to use the avro-confluent format, and that means 
> you need to provide Flink with access to your Schema Registry.
>
> I think there will be times where you may not want, or may not be able, to 
> provide Flink with direct access to a Schema Registry. In such cases, it 
> would be useful to support the same behaviour that the avro format does (i.e. 
> allow you to explicitly specify a table schema)
>
> This could be achieved with a very minor modification to the avro formatter.
>
> For reading records, we could add an option to the formatter to highlight 
> when records will be Confluent Avro. If that option is set, we just need the 
> formatter to skip the first bytes with the schema ID/version (it can then use 
> the remaining bytes with a regular Avro decoder as it does today – the 
> existing implementation would be essentially unchanged).
>
> For writing records, something similar would work. An option to the formatter 
> to highlight when to write records using Confluent Avro. We would need a way 
> to specify what ID value to use for the first bytes [3]. (After that, the 
> record can be encoded with a regular Avro encoder as it does today – the rest 
> of the implementation would be unchanged).
>
>
> -
> [1] – This is the same as regular Avro, but prefixing the payload with extra 
> bytes that identify which schema to use, to allow an appropriate schema to be 
> retrieved from a schema registry.
>
> [2] – Records that were serialized by 
> io.confluent.kafka.serializers.KafkaAvroSerializer and could be read by 
> io.confluent.kafka.serializers.KafkaAvroDeserializer.
>
> [3] – Either by making them fixed options for that formatter, or by allowing 
> it to be specified from something in the record.
>
> Unless otherwise stated above:
>
> IBM United Kingdom Limited
> Registered in England and Wales with number 741598
> Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU


RE: [DISCUSS] Confluent Avro support without Schema Registry access

2023-10-27 Thread Dale Lane
> if you strip the magic byte, and the schema has
> evolved when you're consuming it from Flink,
> you can end up with deserialization errors given
> that a field might have been deleted/added/
> changed etc.

Aren’t we already fairly dependent on the schema remaining consistent, because 
otherwise we’d need to update the table schema as well?

> it wouldn't work when you actually want to
> write avro-confluent, because that requires a
> check when producing if you're still being compliant.

I’m not sure what you mean here, sorry. Are you thinking about issues if you 
needed to mix-and-match with both formatters at the same time? (Rather than 
just using the Avro formatter as I was describing)

Kind regards

Dale



From: Martijn Visser 
Date: Friday, 27 October 2023 at 14:03
To: dev@flink.apache.org 
Subject: [EXTERNAL] Re: [DISCUSS] Confluent Avro support without Schema 
Registry access
Hi Dale,

I'm struggling to understand in what cases you want to read data
serialized in connection with Confluent Schema Registry, but can't get
access to the Schema Registry service. It seems like a rather exotic
situation and it beats the purposes of using a Schema Registry in the
first place? I also doubt that it's actually really useful: if you
strip the magic byte, and the schema has evolved when you're consuming
it from Flink, you can end up with deserialization errors given that a
field might have been deleted/added/changed etc. Also, it wouldn't
work when you actually want to write avro-confluent, because that
requires a check when producing if you're still being compliant.

Best regards,

Martijn

On Fri, Oct 27, 2023 at 2:53 PM Dale Lane  wrote:
>
> TLDR:
> We currently require a connection to a Confluent Schema Registry to be able 
> to work with Confluent Avro data. With a small modification to the Avro 
> formatter, I think we could also offer the ability to process this type of 
> data without requiring access to the schema registry.
>
> What would people think of such an enhancement?
>
> -
>
> When working with Avro data, there are two formats available to us: avro and 
> avro-confluent.
>
> avro
> Data it supports: Avro records
> Approach: You specify a table schema and it derives an appropriate Avro 
> schema from this.
>
> avro-confluent
> Data it supports: Confluent’s variant[1] of the Avro encoding
> Approach: You provide connection details (URL, credentials, 
> keystore/truststore, schema lookup strategy, etc.) for retrieving an 
> appropriate schema from the Confluent Schema Registry.
>
> What this means is if you have Confluent Avro data[2] that you want to use in 
> Flink, you currently have to use the avro-confluent format, and that means 
> you need to provide Flink with access to your Schema Registry.
>
> I think there will be times where you may not want, or may not be able, to 
> provide Flink with direct access to a Schema Registry. In such cases, it 
> would be useful to support the same behaviour that the avro format does (i.e. 
> allow you to explicitly specify a table schema)
>
> This could be achieved with a very minor modification to the avro formatter.
>
> For reading records, we could add an option to the formatter to highlight 
> when records will be Confluent Avro. If that option is set, we just need the 
> formatter to skip the first bytes with the schema ID/version (it can then use 
> the remaining bytes with a regular Avro decoder as it does today – the 
> existing implementation would be essentially unchanged).
>
> For writing records, something similar would work. An option to the formatter 
> to highlight when to write records using Confluent Avro. We would need a way 
> to specify what ID value to use for the first bytes [3]. (After that, the 
> record can be encoded with a regular Avro encoder as it does today – the rest 
> of the implementation would be unchanged).
>
>
> -
> [1] – This is the same as regular Avro, but prefixing the payload with extra 
> bytes that identify which schema to use, to allow an appropriate schema to be 
> retrieved from a schema registry.
>
> [2] – Records that were serialized by 
> io.confluent.kafka.serializers.KafkaAvroSerializer and could be read by 
> io.confluent.kafka.serializers.KafkaAvroDeserializer.
>
> [3] – Either by making them fixed options for that formatter, or by allowing 
> it to be specified from something in the record.
>
> Unless otherwise stated above:
>
> IBM United Kingdom Limited
> Registered in England and Wales with number 741598
> Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU


[jira] [Created] (FLINK-33384) MySQL JDBC driver is deprecated

2023-10-27 Thread david radley (Jira)
david radley created FLINK-33384:


 Summary: MySQL JDBC driver is deprecated
 Key: FLINK-33384
 URL: https://issues.apache.org/jira/browse/FLINK-33384
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / JDBC
Reporter: david radley


I see when running tests on the JDBC connector, I get a warning

_Loading class `com.mysql.jdbc.Driver'. This is deprecated. The new driver 
class is `com.mysql.cj.jdbc.Driver'. The driver is automatically registered via 
the SPI and manual loading of the driver class is generally unnecessary._
 

I suggest we change the class to be loaded from the old to the new non 
deprecated class name.

 

I am happy to implement this and do testing on it. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[RESULT][VOTE] FLIP-373: Support Configuring Different State TTLs using SQL Hint

2023-10-27 Thread Jane Chan
Dear developers,

FLIP-373 [1] has been accepted and voted through this thread [2].

The proposal received nine approving votes, five of which are binding, and
there is no disapproval.

Benchao Li (binding)
Lincoln Lee (binding)
Liu Ron (binding)
Jark Wu (binding)
Sergey Nuyanzin (binding)
Jiabao Sun
Feng Jin
Xuyang
Zakelly Lan


Thanks to all participants for actively engaging in the discussion, voting,
and providing valuable feedback.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-373%3A+Support+Configuring+Different+State+TTLs+using+SQL+Hint
[2] https://lists.apache.org/thread/837r63gdwzoqryvp3gbf67941g706s5d

Best,
Jane


Re: [ANNOUNCE] Apache Flink 1.18.0 released

2023-10-27 Thread Yun Tang
Thanks to everyone who participated in this release!

Best
Yun Tang

From: Matthias Pohl 
Sent: Friday, October 27, 2023 17:23
To: dev@flink.apache.org 
Subject: Re: [ANNOUNCE] Apache Flink 1.18.0 released

Thanks to everyone who was involved and especially to the 1.18 release
managers. :)

On Fri, Oct 27, 2023 at 9:13 AM Yuepeng Pan  wrote:

> Thanks for the great work! Congratulations to everyone involved!
>
>
> Best,
> Yuepeng Pan
>
> At 2023-10-27 15:06:40, "ConradJam"  wrote:
> >Congratulations!
> >
> >Jingsong Li  于2023年10月27日周五 13:55写道:
> >
> >> Congratulations!
> >>
> >> Thanks Jing and other release managers and all contributors.
> >>
> >> Best,
> >> Jingsong
> >>
> >> On Fri, Oct 27, 2023 at 1:52 PM Zakelly Lan 
> wrote:
> >> >
> >> > Congratulations and thank you all!
> >> >
> >> >
> >> > Best,
> >> > Zakelly
> >> >
> >> > On Fri, Oct 27, 2023 at 12:39 PM Jark Wu  wrote:
> >> > >
> >> > > Congratulations and thanks release managers and everyone who has
> >> > > contributed!
> >> > >
> >> > > Best,
> >> > > Jark
> >> > >
> >> > > On Fri, 27 Oct 2023 at 12:25, Hang Ruan 
> >> wrote:
> >> > >
> >> > > > Congratulations!
> >> > > >
> >> > > > Best,
> >> > > > Hang
> >> > > >
> >> > > > Samrat Deb  于2023年10月27日周五 11:50写道:
> >> > > >
> >> > > > > Congratulations on the great release
> >> > > > >
> >> > > > > Bests,
> >> > > > > Samrat
> >> > > > >
> >> > > > > On Fri, 27 Oct 2023 at 7:59 AM, Yangze Guo 
> >> wrote:
> >> > > > >
> >> > > > > > Great work! Congratulations to everyone involved!
> >> > > > > >
> >> > > > > > Best,
> >> > > > > > Yangze Guo
> >> > > > > >
> >> > > > > > On Fri, Oct 27, 2023 at 10:23 AM Qingsheng Ren <
> re...@apache.org
> >> >
> >> > > > wrote:
> >> > > > > > >
> >> > > > > > > Congratulations and big THANK YOU to everyone helping with
> this
> >> > > > > release!
> >> > > > > > >
> >> > > > > > > Best,
> >> > > > > > > Qingsheng
> >> > > > > > >
> >> > > > > > > On Fri, Oct 27, 2023 at 10:18 AM Benchao Li <
> >> libenc...@apache.org>
> >> > > > > > wrote:
> >> > > > > > >>
> >> > > > > > >> Great work, thanks everyone involved!
> >> > > > > > >>
> >> > > > > > >> Rui Fan <1996fan...@gmail.com> 于2023年10月27日周五 10:16写道:
> >> > > > > > >> >
> >> > > > > > >> > Thanks for the great work!
> >> > > > > > >> >
> >> > > > > > >> > Best,
> >> > > > > > >> > Rui
> >> > > > > > >> >
> >> > > > > > >> > On Fri, Oct 27, 2023 at 10:03 AM Paul Lam <
> >> paullin3...@gmail.com>
> >> > > > > > wrote:
> >> > > > > > >> >
> >> > > > > > >> > > Finally! Thanks to all!
> >> > > > > > >> > >
> >> > > > > > >> > > Best,
> >> > > > > > >> > > Paul Lam
> >> > > > > > >> > >
> >> > > > > > >> > > > 2023年10月27日 03:58,Alexander Fedulov <
> >> > > > > alexander.fedu...@gmail.com>
> >> > > > > > 写道:
> >> > > > > > >> > > >
> >> > > > > > >> > > > Great work, thanks everyone!
> >> > > > > > >> > > >
> >> > > > > > >> > > > Best,
> >> > > > > > >> > > > Alexander
> >> > > > > > >> > > >
> >> > > > > > >> > > > On Thu, 26 Oct 2023 at 21:15, Martijn Visser <
> >> > > > > > martijnvis...@apache.org>
> >> > > > > > >> > > > wrote:
> >> > > > > > >> > > >
> >> > > > > > >> > > >> Thank you all who have contributed!
> >> > > > > > >> > > >>
> >> > > > > > >> > > >> Op do 26 okt 2023 om 18:41 schreef Feng Jin <
> >> > > > > > jinfeng1...@gmail.com>
> >> > > > > > >> > > >>
> >> > > > > > >> > > >>> Thanks for the great work! Congratulations
> >> > > > > > >> > > >>>
> >> > > > > > >> > > >>>
> >> > > > > > >> > > >>> Best,
> >> > > > > > >> > > >>> Feng Jin
> >> > > > > > >> > > >>>
> >> > > > > > >> > > >>> On Fri, Oct 27, 2023 at 12:36 AM Leonard Xu <
> >> > > > > xbjt...@gmail.com>
> >> > > > > > wrote:
> >> > > > > > >> > > >>>
> >> > > > > > >> > >  Congratulations, Well done!
> >> > > > > > >> > > 
> >> > > > > > >> > >  Best,
> >> > > > > > >> > >  Leonard
> >> > > > > > >> > > 
> >> > > > > > >> > >  On Fri, Oct 27, 2023 at 12:23 AM Lincoln Lee <
> >> > > > > > lincoln.8...@gmail.com>
> >> > > > > > >> > >  wrote:
> >> > > > > > >> > > 
> >> > > > > > >> > > > Thanks for the great work! Congrats all!
> >> > > > > > >> > > >
> >> > > > > > >> > > > Best,
> >> > > > > > >> > > > Lincoln Lee
> >> > > > > > >> > > >
> >> > > > > > >> > > >
> >> > > > > > >> > > > Jing Ge 
> 于2023年10月27日周五
> >> > > > > 00:16写道:
> >> > > > > > >> > > >
> >> > > > > > >> > > >> The Apache Flink community is very happy to
> >> announce the
> >> > > > > > release of
> >> > > > > > >> > > > Apache
> >> > > > > > >> > > >> Flink 1.18.0, which is the first release for the
> >> Apache
> >> > > > > > Flink 1.18
> >> > > > > > >> > > > series.
> >> > > > > > >> > > >>
> >> > > > > > >> > > >> Apache Flink® is an open-source unified stream
> and
> >> batch
> >> > > > > data
> >> > > > > > >> > >  processing
> >> > > > > > >> > > >> framework for distributed, high-performing,
> >> > > > > > always-available, an

RE: flink-sql-connector-jdbc new release

2023-10-27 Thread David Radley
Hi Martijn,
Thanks for the link. I suspect I cannot be the release manager, as I do not 
have the required access, but am happy to help this progress, kind 
regards, David.

From: Martijn Visser 
Date: Friday, 27 October 2023 at 12:16
To: dev@flink.apache.org 
Subject: [EXTERNAL] Re: flink-sql-connector-jdbc new release
Hi David,

The release process for connector is documented at
https://cwiki.apache.org/confluence/display/FLINK/Creating+a+flink-connector+release

Best regards,

Martijn

On Fri, Oct 27, 2023 at 12:00 PM David Radley  wrote:
>
> Hi Jing,
> I just spotted the mailing list that it is a regression – I agree it is a 
> blocker,
>Kind regards, David.
>
> From: David Radley 
> Date: Friday, 27 October 2023 at 10:33
> To: dev@flink.apache.org 
> Subject: [EXTERNAL] RE: flink-sql-connector-jdbc new release
> Hi Jing,
> thanks are there any processes documented around getting a release out. Out 
> of interest what is your thinking around this being a blocker? I suspect it 
> is not a regression, but a really nice to have, WDYT,
> Either way it looks interesting – I am going to have a look into this issue 
> to try to move it along– could you assign it to me please,
> Kind regards,  David.
>
> From: Jingsong Li 
> Date: Friday, 27 October 2023 at 06:54
> To: dev@flink.apache.org 
> Subject: [EXTERNAL] Re: flink-sql-connector-jdbc new release
> Hi David,
>
> Thanks for driving this.
>
> I think https://issues.apache.org/jira/browse/FLINK-33365should be a 
> blocker.
>
> Best,
> Jingsong
>
> On Thu, Oct 26, 2023 at 11:43 PM David Radley  wrote:
> >
> > Hi,
> > I propose that we do a 3.2 release of flink-sql-connector-jdbc so that 
> > there is a version matching 1.18 that includes the new dialects. I am happy 
> > to drive this, some pointers to documentation on the process and the 
> > approach to testing the various dialects would be great,
> >
> >  Kind regards, David.
> >
> >
> > Unless otherwise stated above:
> >
> > IBM United Kingdom Limited
> > Registered in England and Wales with number 741598
> > Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU
>
> Unless otherwise stated above:
>
> IBM United Kingdom Limited
> Registered in England and Wales with number 741598
> Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU
>
> Unless otherwise stated above:
>
> IBM United Kingdom Limited
> Registered in England and Wales with number 741598
> Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU


Re: off for a week

2023-10-27 Thread Etienne Chauchot

Thanks Max !

Le 26/10/2023 à 15:44, Maximilian Michels a écrit :

Have a great time off, Etienne!

On Thu, Oct 26, 2023 at 3:38 PM Etienne Chauchot  wrote:

Hi,

FYI, I'll be off and unresponsive for a week starting tomorrow evening.
For ongoing work, please ping me before tomorrow evening or within a week

Best

Etienne

Proposal for Implementing Keyed Watermarks in Apache Flink

2023-10-27 Thread Tawfek Yasser Tawfek
Dear Apache Flink Development Team,

I hope this email finds you well. I propose an exciting new feature for Apache 
Flink that has the potential to significantly enhance its capabilities in 
handling unbounded streams of events, particularly in the context of event-time 
windowing.

As you may be aware, Apache Flink has been at the forefront of Big Data Stream 
processing engines, leveraging windowing techniques to manage unbounded event 
streams effectively. The accuracy of the results obtained from these streams 
relies heavily on the ability to gather all relevant input within a window. At 
the core of this process are watermarks, which serve as unique timestamps 
marking the progression of events in time.

However, our analysis has revealed a critical issue with the current watermark 
generation method in Apache Flink. This method, which operates at the input 
stream level, exhibits a bias towards faster sub-streams, resulting in the 
unfortunate consequence of dropped events from slower sub-streams. Our 
investigations showed that Apache Flink's conventional watermark generation 
approach led to an alarming data loss of approximately 33% when 50% of the keys 
around the median experienced delays. This loss further escalated to over 37% 
when 50% of random keys were delayed.

In response to this issue, we have authored a research paper outlining a novel 
strategy named "keyed watermarks" to address data loss and substantially 
enhance data processing accuracy, achieving at least 99% accuracy in most 
scenarios.

Moreover, we have conducted comprehensive comparative studies to evaluate the 
effectiveness of our strategy against the conventional watermark generation 
method, specifically in terms of event-time tracking accuracy.

We believe that implementing keyed watermarks in Apache Flink can greatly 
enhance its performance and reliability, making it an even more valuable tool 
for organizations dealing with complex, high-throughput data processing tasks.

We kindly request your consideration of this proposal. We would be eager to 
discuss further details, provide the full research paper, or collaborate 
closely to facilitate the integration of this feature into Apache Flink.

Please check this preprint on Research Square: 
https://www.researchsquare.com/article/rs-3395909/

Thank you for your time and attention to this proposal. We look forward to the 
opportunity to contribute to the continued success and evolution of Apache 
Flink.

Best Regards,

Tawfik Yasser
Senior Teaching Assistant @ Nile University, Egypt
Email: tyas...@nu.edu.eg
LinkedIn: https://www.linkedin.com/in/tawfikyasser/


Re: Proposal for Implementing Keyed Watermarks in Apache Flink

2023-10-27 Thread Alexander Fedulov
Hi Tawfek,

Thanks for sharing. I am trying to understand what exact real-life problem
you are tackling with this approach. My understanding from skimming through
the paper is that you are concerned about some outlier event producers from
which the events can be delayed beyond what is expected in the overall
system.
Do I get it correctly that the keyed watermarking only targets scenarios of
calculating keyed windows (which are also keyed by the same producer ids)?

Best,
Alexander Fedulov

On Fri, 27 Oct 2023 at 19:07, Tawfek Yasser Tawfek 
wrote:

> Dear Apache Flink Development Team,
>
> I hope this email finds you well. I propose an exciting new feature for
> Apache Flink that has the potential to significantly enhance its
> capabilities in handling unbounded streams of events, particularly in the
> context of event-time windowing.
>
> As you may be aware, Apache Flink has been at the forefront of Big Data
> Stream processing engines, leveraging windowing techniques to manage
> unbounded event streams effectively. The accuracy of the results obtained
> from these streams relies heavily on the ability to gather all relevant
> input within a window. At the core of this process are watermarks, which
> serve as unique timestamps marking the progression of events in time.
>
> However, our analysis has revealed a critical issue with the current
> watermark generation method in Apache Flink. This method, which operates at
> the input stream level, exhibits a bias towards faster sub-streams,
> resulting in the unfortunate consequence of dropped events from slower
> sub-streams. Our investigations showed that Apache Flink's conventional
> watermark generation approach led to an alarming data loss of approximately
> 33% when 50% of the keys around the median experienced delays. This loss
> further escalated to over 37% when 50% of random keys were delayed.
>
> In response to this issue, we have authored a research paper outlining a
> novel strategy named "keyed watermarks" to address data loss and
> substantially enhance data processing accuracy, achieving at least 99%
> accuracy in most scenarios.
>
> Moreover, we have conducted comprehensive comparative studies to evaluate
> the effectiveness of our strategy against the conventional watermark
> generation method, specifically in terms of event-time tracking accuracy.
>
> We believe that implementing keyed watermarks in Apache Flink can greatly
> enhance its performance and reliability, making it an even more valuable
> tool for organizations dealing with complex, high-throughput data
> processing tasks.
>
> We kindly request your consideration of this proposal. We would be eager
> to discuss further details, provide the full research paper, or collaborate
> closely to facilitate the integration of this feature into Apache Flink.
>
> Please check this preprint on Research Square:
> https://www.researchsquare.com/article/rs-3395909/<
> https://www.researchsquare.com/article/rs-3395909/v1>
>
> Thank you for your time and attention to this proposal. We look forward to
> the opportunity to contribute to the continued success and evolution of
> Apache Flink.
>
> Best Regards,
>
> Tawfik Yasser
> Senior Teaching Assistant @ Nile University, Egypt
> Email: tyas...@nu.edu.eg
> LinkedIn: https://www.linkedin.com/in/tawfikyasser/
>


RE: [DISCUSS] FLIP-326: Enhance Watermark to Support Processing-Time Temporal Join

2023-10-27 Thread Alexander Smirnov
 Hi Dong and Xuannan,

Thanks for your proposal! Processing time temporal join is a very important
feature, the proper implementation of which users have been waiting for a
long time.

However, I am wondering whether it is worth enhancing Watermarks and
related classes in order to support this feature. As was mentioned earlier
in this thread, the proposed implementation looks a bit unnecessarily
complicated. Moreover, I think that Watermarks and Processing time are the
concepts that should not be mixed, because the original purpose of
watermarks is to track the progress in Event time. Having the logic related
to Processing time in Watermarks class could be a little bit confusing. I
guess that I'm not alone with such concerns. Are there any other possible
use cases to have the flag 'useProcessingTime' in Watermark class?

Speaking about alternative approaches, the idea to use
RecordAttributes(isBacklog=true/false) from FLIP-327 looks much better to
me. Using the ' isBacklog' flag in TemporalProcessTimeJoinOperator may
clearly state why it needs to buffer probe side data initially (because of
backlog data from the build side of the join). In addition, such
implementation would be much easier, since it uses already implemented
FLIP-327 as the basis. Dong mentioned that this approach could be a bit
hacky because originally, RecordAttributes(isBacklog=true/false) was
proposed for optimization purposes ("optional" buffering). Can't we
generalize the semantics of the ' isBacklog' flag to just notifying
operators about the type of data that is currently being processed in
stream (backlog / realtime)? In this case, there won't be a semantic
problem anymore.

Additionally, I have a little bit of an off-topic question. As I know, both
Processing Time Temporal Join and Lookup Join use the same 'FOR SYSTEM TIME
AS OF' syntax in Flink SQL. How is it planned to differentiate between
these 2 kinds of join in Flink planner after enabling this syntax for
Processing Time Temporal Join? I could propose to check the presence of
existing LOOKUP SQL HINT [1] for enabling Lookup Join, but such change is
not backward compatible, so we need to come up with another approach.  I
think that it should be mentioned in the FLIP also.

Best regards,
Alexander

[1]
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/hints/#lookup


On 2023/06/25 07:02:00 Xuannan Su wrote:
> Hi all,
>
> Dong(cc'ed) and I are opening this thread to discuss our proposal to
> enhance the watermark to properly support processing-time temporal
> join, which has been documented in FLIP-326 [1].
>
> We want to support the use case where the records from the probe side
> of the processing-time temporal join need to wait until the build side
> finishes the snapshot phrase by enhancing the expressiveness of the
> Watermark. Additionally, these changes lay the groundwork for
> simplifying the DataStream APIs, eliminating the need for users to
> explicitly differentiate between event-time and processing-time,
> resulting in a more intuitive user experience.
>
> Please refer to the FLIP document for more details about the proposed
> design and implementation. We welcome any feedback and opinions on
> this proposal.
>
> Best regards,
>
> Dong and Xuannan
>
> [1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-326%3A+Enhance+Watermark+to+Support+Processing-Time+Temporal+Join
>


Re: [DISCUSS] FLIP-377: Support configuration to disable filter push down for Table/SQL Sources

2023-10-27 Thread Venkatakrishnan Sowrirajan
Thanks for the proposal, Jiabao.

I agree with Becket if a *Source* is implementing the *SupportsXXXPushDown*
(in this case *SupportsFilterPushdown*) interface, then the *Source* (in
your FLIP example which is a database) is designed to support filter
pushdown. The corresponding Source can have mechanisms built into it to
detect cases where applying the filter pushdown adds additional computation
pressure which can affect the stability of the system - if so disable it.

Could you please elaborate on the use cases where users know upfront itself
(but not detectable at the source level), that for a specific job or SQL,
where *applyFilters *could negatively affect the overall performance of the
query or the external system or any other use cases where the ***PushDown *has
to be selectively disabled for specific sources?

Regards
Venkata krishnan


On Fri, Oct 27, 2023 at 2:48 AM Jark Wu  wrote:

> Hi Becket,
>
> I checked the history of "
> *table.optimizer.source.predicate-pushdown-enabled*",
> it seems it was introduced since the legacy FilterableTableSource interface
> which might be an experiential feature at that time. I don't see the
> necessity
> of this option at the moment. Maybe we can deprecate this option and drop
> it
> in Flink 2.0[1] if it is not necessary anymore. This may help to
> simplify this discussion.
>
>
> Best,
> Jark
>
> [1]:
> https://urldefense.com/v3/__https://issues.apache.org/jira/browse/FLINK-32383__;!!IKRxdwAv5BmarQ!dc-Q4Kn9OWLkpDKBZwATS0hujC6KJShXBh_sk3-W2giD8vNbfm3UdHq4mAhiXw5ITHkQSl5HYkzkCw$
>
>
>
> On Thu, 26 Oct 2023 at 10:14, Becket Qin  wrote:
>
> > Thanks for the proposal, Jiabao. My two cents below:
> >
> > 1. If I understand correctly, the motivation of the FLIP is mainly to
> make
> > predicate pushdown optional on SOME of the Sources. If so, intuitively
> the
> > configuration should be Source specific instead of general. Otherwise, we
> > will end up with general configurations that may not take effect for some
> > of the Source implementations. This violates the basic rule of a
> > configuration - it does what it says, regardless of the implementation.
> > While configuration standardization is usually a good thing, it should
> not
> > break the basic rules.
> > If we really want to have this general configuration, for the sources
> this
> > configuration does not apply, they should throw an exception to make it
> > clear that this configuration is not supported. However, that seems ugly.
> >
> > 2. I think the actual motivation of this FLIP is about "how a source
> > should implement predicate pushdown efficiently", not "whether predicate
> > pushdown should be applied to the source." For example, if a source wants
> > to avoid additional computing load in the external system, it can always
> > read the entire record and apply the predicates by itself. However, from
> > the Flink perspective, the predicate pushdown is applied, it is just
> > implemented differently by the source. So the design principle here is
> that
> > Flink only cares about whether a source supports predicate pushdown or
> not,
> > it does not care about the implementation efficiency / side effect of the
> > predicates pushdown. It is the Source implementation's responsibility to
> > ensure the predicates pushdown is implemented efficiently and does not
> > impose excessive pressure on the external system. And it is OK to have
> > additional configurations to achieve this goal. Obviously, such
> > configurations will be source specific in this case.
> >
> > 3. Regarding the existing configurations of
> *table.optimizer.source.predicate-pushdown-enabled.
> > *I am not sure why we need it. Supposedly, if a source implements a
> > SupportsXXXPushDown interface, the optimizer should push the
> corresponding
> > predicates to the Source. I am not sure in which case this configuration
> > would be used. Any ideas @Jark Wu ?
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> >
> > On Wed, Oct 25, 2023 at 11:55 PM Jiabao Sun
> >  wrote:
> >
> >> Thanks Jane for the detailed explanation.
> >>
> >> I think that for users, we should respect conventions over
> >> configurations.
> >> Conventions can be default values explicitly specified in
> configurations,
> >> or they can be behaviors that follow previous versions.
> >> If the same code has different behaviors in different versions, it would
> >> be a very bad thing.
> >>
> >> I agree that for regular users, it is not necessary to understand all
> the
> >> configurations related to Flink.
> >> By following conventions, they can have a good experience.
> >>
> >> Let's get back to the practical situation and consider it.
> >>
> >> Case 1:
> >> The user is not familiar with the purpose of the
> >> table.optimizer.source.predicate-pushdown-enabled configuration but
> follows
> >> the convention of allowing predicate pushdown to the source by default.
> >> Just understanding the source.predicate-pushdown-enabled configuration
> >> and performing fine-gra

Re: [DISCUSS] FLIP-377: Support configuration to disable filter push down for Table/SQL Sources

2023-10-27 Thread Jiabao Sun
Thanks Venkatakrishnan for the feedback.

Taking MySQL as an example, if the pushed-down filter does not hit an index, it 
will result in a full table scan. 
For a table with a large amount of data, a full table scan can consume a 
significant amount of CPU resources,
increase response time, hold connections for a long time, and impact the 
overall performance of the database.

Best,
Jiabao


> 2023年10月28日 13:34,Venkatakrishnan Sowrirajan  写道:
> 
> Thanks for the proposal, Jiabao.
> 
> I agree with Becket if a *Source* is implementing the *SupportsXXXPushDown*
> (in this case *SupportsFilterPushdown*) interface, then the *Source* (in
> your FLIP example which is a database) is designed to support filter
> pushdown. The corresponding Source can have mechanisms built into it to
> detect cases where applying the filter pushdown adds additional computation
> pressure which can affect the stability of the system - if so disable it.
> 
> Could you please elaborate on the use cases where users know upfront itself
> (but not detectable at the source level), that for a specific job or SQL,
> where *applyFilters *could negatively affect the overall performance of the
> query or the external system or any other use cases where the ***PushDown *has
> to be selectively disabled for specific sources?
> 
> Regards
> Venkata krishnan
> 
> 
> On Fri, Oct 27, 2023 at 2:48 AM Jark Wu  > wrote:
> 
>> Hi Becket,
>> 
>> I checked the history of "
>> *table.optimizer.source.predicate-pushdown-enabled*",
>> it seems it was introduced since the legacy FilterableTableSource interface
>> which might be an experiential feature at that time. I don't see the
>> necessity
>> of this option at the moment. Maybe we can deprecate this option and drop
>> it
>> in Flink 2.0[1] if it is not necessary anymore. This may help to
>> simplify this discussion.
>> 
>> 
>> Best,
>> Jark
>> 
>> [1]:
>> https://urldefense.com/v3/__https://issues.apache.org/jira/browse/FLINK-32383__;!!IKRxdwAv5BmarQ!dc-Q4Kn9OWLkpDKBZwATS0hujC6KJShXBh_sk3-W2giD8vNbfm3UdHq4mAhiXw5ITHkQSl5HYkzkCw$
>> 
>> 
>> 
>> On Thu, 26 Oct 2023 at 10:14, Becket Qin > > wrote:
>> 
>>> Thanks for the proposal, Jiabao. My two cents below:
>>> 
>>> 1. If I understand correctly, the motivation of the FLIP is mainly to
>> make
>>> predicate pushdown optional on SOME of the Sources. If so, intuitively
>> the
>>> configuration should be Source specific instead of general. Otherwise, we
>>> will end up with general configurations that may not take effect for some
>>> of the Source implementations. This violates the basic rule of a
>>> configuration - it does what it says, regardless of the implementation.
>>> While configuration standardization is usually a good thing, it should
>> not
>>> break the basic rules.
>>> If we really want to have this general configuration, for the sources
>> this
>>> configuration does not apply, they should throw an exception to make it
>>> clear that this configuration is not supported. However, that seems ugly.
>>> 
>>> 2. I think the actual motivation of this FLIP is about "how a source
>>> should implement predicate pushdown efficiently", not "whether predicate
>>> pushdown should be applied to the source." For example, if a source wants
>>> to avoid additional computing load in the external system, it can always
>>> read the entire record and apply the predicates by itself. However, from
>>> the Flink perspective, the predicate pushdown is applied, it is just
>>> implemented differently by the source. So the design principle here is
>> that
>>> Flink only cares about whether a source supports predicate pushdown or
>> not,
>>> it does not care about the implementation efficiency / side effect of the
>>> predicates pushdown. It is the Source implementation's responsibility to
>>> ensure the predicates pushdown is implemented efficiently and does not
>>> impose excessive pressure on the external system. And it is OK to have
>>> additional configurations to achieve this goal. Obviously, such
>>> configurations will be source specific in this case.
>>> 
>>> 3. Regarding the existing configurations of
>> *table.optimizer.source.predicate-pushdown-enabled.
>>> *I am not sure why we need it. Supposedly, if a source implements a
>>> SupportsXXXPushDown interface, the optimizer should push the
>> corresponding
>>> predicates to the Source. I am not sure in which case this configuration
>>> would be used. Any ideas @Jark Wu >> >?
>>> 
>>> Thanks,
>>> 
>>> Jiangjie (Becket) Qin
>>> 
>>> 
>>> On Wed, Oct 25, 2023 at 11:55 PM Jiabao Sun
>>> mailto:jiabao@xtransfer.cn.invalid>> 
>>> wrote:
>>> 
 Thanks Jane for the detailed explanation.
 
 I think that for users, we should respect conventions over
 configurations.
 Conventions can be default values explicitly specified in
>> configurations,
 or they can be behaviors that follow previous versions.
 If the same code 

Re: [VOTE] Apache Flink Kafka connector version 3.0.1, RC1

2023-10-27 Thread Xianxun Ye
+1(non-binding)

- Started a local Flink 1.18 cluster, read and wrote with Kafka and Upsert 
Kafka connector successfully to Kafka 2.2 cluster

One minor question: should we update the dependency manual of these two 
documentations[1][2]?

[1] 
https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/kafka/#dependencies
[2] 
https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/upsert-kafka/#dependencies

Best regards,
Xianxun

> 2023年10月26日 16:12,Martijn Visser  写道:
> 
> +1 (binding)
> 
> - Validated hashes
> - Verified signature
> - Verified that no binaries exist in the source archive
> - Build the source with Maven via mvn clean install
> -Pcheck-convergence -Dflink.version=1.18.0
> - Verified licenses
> - Verified web PR
> - Started a cluster and the Flink SQL client, successfully read and
> wrote with the Kafka connector to Confluent Cloud with AVRO and Schema
> Registry enabled
> 
> On Thu, Oct 26, 2023 at 5:09 AM Qingsheng Ren  wrote:
>> 
>> +1 (binding)
>> 
>> - Verified signature and checksum
>> - Verified that no binary exists in the source archive
>> - Built from source with Java 8 using -Dflink.version=1.18
>> - Started a local Flink 1.18 cluster, submitted jobs with SQL client
>> reading from and writing (with exactly-once) to Kafka 3.2.3 cluster
>> - Nothing suspicious in LICENSE and NOTICE file
>> - Reviewed web PR
>> 
>> Thanks for the effort, Gordon!
>> 
>> Best,
>> Qingsheng
>> 
>> On Thu, Oct 26, 2023 at 5:13 AM Tzu-Li (Gordon) Tai 
>> wrote:
>> 
>>> Hi everyone,
>>> 
>>> Please review and vote on release candidate #1 for version 3.0.1 of the
>>> Apache Flink Kafka Connector, as follows:
>>> [ ] +1, Approve the release
>>> [ ] -1, Do not approve the release (please provide specific comments)
>>> 
>>> This release contains important changes for the following:
>>> - Supports Flink 1.18.x series
>>> - [FLINK-28303] EOS violation when using LATEST_OFFSETS startup mode
>>> - [FLINK-33231] Memory leak causing OOM when there are no offsets to commit
>>> back to Kafka
>>> - [FLINK-28758] FlinkKafkaConsumer fails to stop with savepoint
>>> 
>>> The release candidate contains the source release as well as JAR artifacts
>>> to be released to Maven, built against Flink 1.17.1 and 1.18.0.
>>> 
>>> The complete staging area is available for your review, which includes:
>>> * JIRA release notes [1],
>>> * the official Apache source release to be deployed to dist.apache.org
>>> [2],
>>> which are signed with the key with fingerprint
>>> 1C1E2394D3194E1944613488F320986D35C33D6A [3],
>>> * all artifacts to be deployed to the Maven Central Repository [4],
>>> * source code tag v3.0.1-rc1 [5],
>>> * website pull request listing the new release [6].
>>> 
>>> The vote will be open for at least 72 hours. It is adopted by majority
>>> approval, with at least 3 PMC affirmative votes.
>>> 
>>> Thanks,
>>> Gordon
>>> 
>>> [1]
>>> 
>>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12352910
>>> [2]
>>> 
>>> https://dist.apache.org/repos/dist/dev/flink/flink-connector-kafka-3.0.1-rc1/
>>> [3] https://dist.apache.org/repos/dist/release/flink/KEYS
>>> [4] https://repository.apache.org/content/repositories/orgapacheflink-1664
>>> [5] https://github.com/apache/flink-connector-kafka/commits/v3.0.1-rc1
>>> [6] https://github.com/apache/flink-web/pull/692
>>>