Re: [ANNOUNCE] New Apache Flink PMC Member - Leonard Xu

2023-04-24 Thread Jing Ge
Congrats! Leonard!



Best regards,

Jing

On Mon, Apr 24, 2023 at 5:53 AM Matthias Pohl
 wrote:

> Congrats, Leonard :)
>
> On Mon, Apr 24, 2023, 05:17 Yangze Guo  wrote:
>
> > Congratulations, Leonard!
> >
> > Best,
> > Yangze Guo
> >
> > On Mon, Apr 24, 2023 at 10:05 AM Shuo Cheng  wrote:
> > >
> > > Congratulations, Leonard.
> > >
> > > Best,
> > > Shuo
> > >
> > > On Sun, Apr 23, 2023 at 7:43 PM Sergey Nuyanzin 
> > wrote:
> > >
> > > > Congratulations, Leonard!
> > > >
> > > > On Sun, Apr 23, 2023 at 1:38 PM Zhipeng Zhang <
> zhangzhipe...@gmail.com
> > >
> > > > wrote:
> > > >
> > > > > Congratulations, Leonard.
> > > > >
> > > > > Hang Ruan  于2023年4月23日周日 19:03写道:
> > > > > >
> > > > > > Congratulations, Leonard.
> > > > > >
> > > > > > Best,
> > > > > > Hang
> > > > > >
> > > > > > Yanfei Lei  于2023年4月23日周日 18:34写道:
> > > > > >
> > > > > > > Congratulations, Leonard!
> > > > > > >
> > > > > > > Best,
> > > > > > > Yanfei
> > > > > > >
> > > > > > > liu ron  于2023年4月23日周日 17:45写道:
> > > > > > > >
> > > > > > > > Congratulations, Leonard.
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Ron
> > > > > > > >
> > > > > > > > Zhanghao Chen  于2023年4月23日周日
> > 17:33写道:
> > > > > > > >
> > > > > > > > > Congratulations, Leonard!
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Zhanghao Chen
> > > > > > > > > 
> > > > > > > > > From: Shammon FY 
> > > > > > > > > Sent: Sunday, April 23, 2023 17:22
> > > > > > > > > To: dev@flink.apache.org 
> > > > > > > > > Subject: Re: [ANNOUNCE] New Apache Flink PMC Member -
> > Leonard Xu
> > > > > > > > >
> > > > > > > > > Congratulations, Leonard!
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Shammon FY
> > > > > > > > >
> > > > > > > > > On Sun, Apr 23, 2023 at 5:07 PM Xianxun Ye <
> > > > > yesorno828...@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Congratulations, Leonard!
> > > > > > > > > >
> > > > > > > > > > Best regards,
> > > > > > > > > >
> > > > > > > > > > Xianxun
> > > > > > > > > >
> > > > > > > > > > > 2023年4月23日 09:10,Lincoln Lee 
> > 写道:
> > > > > > > > > > >
> > > > > > > > > > > Congratulations, Leonard!
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > best,
> > > > > Zhipeng
> > > > >
> > > >
> > > >
> > > > --
> > > > Best regards,
> > > > Sergey
> > > >
> >
>


Re: [ANNOUNCE] New Apache Flink PMC Member - Qingsheng Ren

2023-04-24 Thread Jing Ge
Congrats! Qingsheng!



Best regards,

Jing

On Mon, Apr 24, 2023 at 9:35 AM Zakelly Lan  wrote:

> Congratulations, Qingsheng!
>
> Best regards,
> Zakelly
>
> On Mon, Apr 24, 2023 at 11:52 AM Matthias Pohl
>  wrote:
> >
> > Congratulations, Qingsheng! :)
> >
> > On Mon, Apr 24, 2023, 05:17 Yangze Guo  wrote:
> >
> > > Congratulations, Qingsheng!
> > >
> > > Best,
> > > Yangze Guo
> > >
> > > On Mon, Apr 24, 2023 at 10:05 AM Shuo Cheng 
> wrote:
> > > >
> > > > Congratulations, Qingsheng!
> > > >
> > > > Best,
> > > > Shuo
> > > >
> > > > On Sun, Apr 23, 2023 at 7:43 PM Sergey Nuyanzin  >
> > > wrote:
> > > >
> > > > > Congratulations, Qingsheng!
> > > > >
> > > > > On Sun, Apr 23, 2023 at 1:37 PM Zhipeng Zhang <
> zhangzhipe...@gmail.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > Congratulations, Qingsheng!
> > > > > >
> > > > > > Hang Ruan  于2023年4月23日周日 19:03写道:
> > > > > > >
> > > > > > > Congratulations, Qingsheng!
> > > > > > >
> > > > > > > Best,
> > > > > > > Hang
> > > > > > >
> > > > > > > Yanfei Lei  于2023年4月23日周日 18:33写道:
> > > > > > >
> > > > > > > > Congratulations, Qingsheng!
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Yanfei
> > > > > > > >
> > > > > > > > liu ron  于2023年4月23日周日 17:47写道:
> > > > > > > > >
> > > > > > > > > Congratulations, Qingsheng.
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Ron
> > > > > > > > >
> > > > > > > > > Zhanghao Chen  于2023年4月23日周日
> > > 17:32写道:
> > > > > > > > >
> > > > > > > > > > Congratulations, Qingsheng!
> > > > > > > > > >
> > > > > > > > > > Best,
> > > > > > > > > > Zhanghao Chen
> > > > > > > > > > 
> > > > > > > > > > From: Shammon FY 
> > > > > > > > > > Sent: Sunday, April 23, 2023 17:22
> > > > > > > > > > To: dev@flink.apache.org 
> > > > > > > > > > Subject: Re: [ANNOUNCE] New Apache Flink PMC Member -
> > > Qingsheng
> > > > > Ren
> > > > > > > > > >
> > > > > > > > > > Congratulations, Qingsheng!
> > > > > > > > > >
> > > > > > > > > > Best,
> > > > > > > > > > Shammon FY
> > > > > > > > > >
> > > > > > > > > > On Sun, Apr 23, 2023 at 4:40 PM Weihua Hu <
> > > > > huweihua@gmail.com>
> > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Congratulations, Qingsheng!
> > > > > > > > > > >
> > > > > > > > > > > Best,
> > > > > > > > > > > Weihua
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Sun, Apr 23, 2023 at 3:53 PM Yun Tang <
> myas...@live.com
> > > >
> > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Congratulations, Qingsheng!
> > > > > > > > > > > >
> > > > > > > > > > > > Best
> > > > > > > > > > > > Yun Tang
> > > > > > > > > > > > 
> > > > > > > > > > > > From: weijie guo 
> > > > > > > > > > > > Sent: Sunday, April 23, 2023 14:50
> > > > > > > > > > > > To: dev@flink.apache.org 
> > > > > > > > > > > > Subject: Re: [ANNOUNCE] New Apache Flink PMC Member -
> > > > > > Qingsheng Ren
> > > > > > > > > > > >
> > > > > > > > > > > > Congratulations, Qingsheng!
> > > > > > > > > > > >
> > > > > > > > > > > > Best regards,
> > > > > > > > > > > >
> > > > > > > > > > > > Weijie
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Geng Biao  于2023年4月23日周日
> 14:29写道:
> > > > > > > > > > > >
> > > > > > > > > > > > > Congrats, Qingsheng!
> > > > > > > > > > > > > Best,
> > > > > > > > > > > > > Biao Geng
> > > > > > > > > > > > >
> > > > > > > > > > > > > 获取 Outlook for iOS
> > > > > > > > > > > > > 
> > > > > > > > > > > > > 发件人: Wencong Liu 
> > > > > > > > > > > > > 发送时间: Sunday, April 23, 2023 11:06:39 AM
> > > > > > > > > > > > > 收件人: dev@flink.apache.org 
> > > > > > > > > > > > > 主题: Re:[ANNOUNCE] New Apache Flink PMC Member -
> > > Qingsheng
> > > > > Ren
> > > > > > > > > > > > >
> > > > > > > > > > > > > Congratulations, Qingsheng!
> > > > > > > > > > > > >
> > > > > > > > > > > > > Best,
> > > > > > > > > > > > > Wencong LIu
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > At 2023-04-21 19:47:52, "Jark Wu" <
> imj...@gmail.com>
> > > > > wrote:
> > > > > > > > > > > > > >Hi everyone,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >We are thrilled to announce that Leonard Xu has
> > > joined the
> > > > > > Flink
> > > > > > > > > > PMC!
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >Leonard has been an active member of the Apache
> Flink
> > > > > > community
> > > > > > > > for
> > > > > > > > > > > many
> > > > > > > > > > > > > >years and became a committer in Nov 2021. He has
> been
> > > > > > involved
> > > > > > > > in
> > > > > > > > > 

Re: [DISCUSS FLINKSQL PARALLELISM]

2023-04-24 Thread Jing Ge
Hi Green,



Since FLIP-292 opened the door to do fine-grained tuning at operator level
for Flink SQL jobs, I would also suggest leveraging the compiled json to do
further config optimization like Yun Tang already mentioned. We should
consider making it(leveraging the compiled json plan) the stand process for
Flink SQL job fine-grained tuning.



Best regards,

Jing

On Wed, Apr 19, 2023 at 8:44 AM Yun Tang  wrote:

> I noticed that Yuxia had replied that "sink.paralleilsm" could help in
> some cases.
>
> I think a better way is to integrate it with streamGraph or extend
> CompiledPlan just as FLIP-292 setting state TTL per operator [1] does.
>
>
> [1]
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=240883951
>
> Best
> Yun Tang
> 
> From: GREEN <1286649...@qq.com.INVALID>
> Sent: Tuesday, April 18, 2023 17:21
> To: dev 
> Subject: Re: [DISCUSS FLINKSQL PARALLELISM]
>
> During the process of generating streamgraph,I can modify the edge
> partitioner by configuring parameters.
> Just need to know in advance the structure of the streamgraph,This can be
> obtained by printing log.
>
>
>
> ---Original---
> From: "liu ron" Date: Tue, Apr 18, 2023 09:37 AM
> To: "dev" Subject: Re: [DISCUSS FLINKSQL PARALLELISM]
>
>
> Hi, Green
>
> Thanks for driving this discussion, in batch mode we have the Adaptive
> Batch Scheduler which automatically derives operator parallelism based on
> data volume at runtime, so we don't need to care about the parallelism.
> However, in stream mode, currently, Flink SQL can only set the parallelism
> of an operator globally, and many users would like to set the parallelism
> of an operator individually, which seems to be a pain point at the moment,
> and it would make sense to support set parallelism at operator granularity.
> Do you have any idea about the solution for this problem?
>
> Best,
> Ron
>
>
> GREEN <1286649...@qq.com.invalid> 于2023年4月14日周五 16:03写道:
>
> > Problem: 
> >
> >
> > Currently, FlinkSQL can  set a unified parallelism in the job,it
> > cannot set parallelism for each operator.
> > This can cause resource waste  On the occasion of  high
> > parallelism and small data volume.there may also be too many small
> > file  for  writing HDFS Scene.
> >
> >
> > Solution:
> > I can modify FlinkSQL to support operator parallelism.Is it
> meaningful to
> > do this?Let's discuss.
>


Re: Re: [DISCUSS] FLIP-305: Support atomic for CREATE TABLE AS SELECT(CTAS) statement

2023-04-24 Thread Jing Ge
Hi Mang,



Thanks for clarifying it. I am trying to understand your thoughts. Do you
actually mean the boundedness[1] instead of the execution modes[2]? I.e.
the atomic CTAS will be only supported for bounded data.



Best regards,

Jing



[1] https://flink.apache.org/what-is-flink/flink-architecture/

[2]
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/execution_mode/#execution-mode-batchstreaming

On Wed, Apr 19, 2023 at 9:14 AM Mang Zhang  wrote:

> hi, Jing
>
> Thank you for your reply.
>
> >1. It looks like you found another way to design the atomic CTAS with new
> >serializable TwoPhaseCatalogTable instead of making Catalog serializable as
> >described in FLIP-218. Did I understand correctly?
> Yes, when I was implementing the FLIP-218 solution, I encountered problems 
> with Catalog/CatalogTable serialization deserialization, for example, after 
> deserialization CatalogTable could not be converted to Hive Table. Also, 
> Catalog serialization is still a heavy operation, but it may not actually be 
> necessary, we just need Create Table.
> Therefore, the TwoPhaseCatalogTable program is proposed, which also 
> facilitates the implementation of the subsequent data lake, ReplaceTable and 
> other functions.
>
> >2. I am a little bit confused about the isStreamingMode parameter of
> >Catalog#twoPhaseCreateTable(...), since it is the selector argument(code
> >smell) we should commonly avoid in the public interface. According to the
> >FLIP,  isStreamingMode will be used by the Catalog to determine whether to
> >support atomic or not. With this selector argument, there will be two
> >different logics built within one method and it is hard to follow without
> >reading the code or the doc carefully(another concern is to keep the doc
> >and code alway be consistent) i.e. sometimes there will be no difference by
> >using true/false isStreamingMode, sometimes they are quite different -
> >atomic vs. non-atomic. Another question is, before we call
> >Catalog#twoPhaseCreateTable(...), we have to know the value of
> >isStreamingMode. In case only non-atomic is supported for streaming mode,
> >we could just follow FLIP-218 instead of (twistedly) calling
> >Catalog#twoPhaseCreateTable(...) with a false isStreamingMode. Did I miss
> >anything here?
>
> Here's what I think about this issue, atomic CTAS wants to be the default
> behavior and only fall back to non-atomic CTAS if it's completely
> unattainable. Atomic CTAS will bring a better experience to users.
> Flink is already a stream batch unified engine, In our company kwai, many
> users are also using flink to do batch data processing, but still running
> in Stream mode.
> The boundary between stream and batch is gradually blurred, stream mode
> jobs may also FINISH, so I added the isStreamingMode parameter, this
> provides different atomicity implementations in Batch and Stream modes.
> Not only to determine if atomicity is supported, but also to help select
> different TwoPhaseCatalogTable implementations to provide different levels
> of atomicity!
>
> Looking forward to more feedback.
>
>
>
> --
>
> Best regards,
>
> Mang Zhang
>
>
>
> At 2023-04-15 04:20:40, "Jing Ge"  wrote:
> >Hi Mang,
> >
> >This is the FLIP I was looking forward to after FLIP-218. Thanks for
> >driving it. I have two questions and would like to know your thoughts,
> >thanks:
> >
> >1. It looks like you found another way to design the atomic CTAS with new
> >serializable TwoPhaseCatalogTable instead of making Catalog serializable as
> >described in FLIP-218. Did I understand correctly?
> >2. I am a little bit confused about the isStreamingMode parameter of
> >Catalog#twoPhaseCreateTable(...), since it is the selector argument(code
> >smell) we should commonly avoid in the public interface. According to the
> >FLIP,  isStreamingMode will be used by the Catalog to determine whether to
> >support atomic or not. With this selector argument, there will be two
> >different logics built within one method and it is hard to follow without
> >reading the code or the doc carefully(another concern is to keep the doc
> >and code alway be consistent) i.e. sometimes there will be no difference by
> >using true/false isStreamingMode, sometimes they are quite different -
> >atomic vs. non-atomic. Another question is, before we call
> >Catalog#twoPhaseCreateTable(...), we have to know the value of
> >isStreamingMode. In case only non-atomic is supported for streaming mode,
> >we could just follow FLIP-218 instead of (twistedly) calling
> >Catalog#twoPhaseCreateTable(...) with a false isStreami

Re: [Discussion] - Release major Flink version to support JDK 17 (LTS)

2023-04-24 Thread Jing Ge
Thanks Chesnay for working on this. Would you like to share more info about
the JDK bug?

Best regards,
Jing

On Mon, Apr 24, 2023 at 11:39 AM Chesnay Schepler 
wrote:

> As it turns out Kryo isn't a blocker; we ran into a JDK bug.
>
> On 31/03/2023 08:57, Chesnay Schepler wrote:
>
>
> https://github.com/EsotericSoftware/kryo/wiki/Migration-to-v5#migration-guide
>
> Kroy themselves state that v5 likely can't read v2 data.
>
> However, both versions can be on the classpath without classpath as v5
> offers a versioned artifact that includes the version in the package.
>
> It probably wouldn't be difficult to migrate a savepoint to Kryo v5,
> purely from a read/write perspective.
>
> The bigger question is how we expose this new Kryo version in the API. If
> we stick to the versioned jar we need to either duplicate all current
> Kryo-related APIs or find a better way to integrate other serialization
> stacks.
> On 30/03/2023 17:50, Piotr Nowojski wrote:
>
> Hey,
>
> > 1. The Flink community agrees that we upgrade Kryo to a later version,
> which means breaking all checkpoint/savepoint compatibility and releasing a
> Flink 2.0 with Java 17 support added and Java 8 and Flink Scala API support
> dropped. This is probably the quickest way, but would still mean that we
> expose Kryo in the Flink APIs, which is the main reason why we haven't been
> able to upgrade Kryo at all.
>
> This sounds pretty bad to me.
>
> Has anyone looked into what it would take to provide a smooth migration
> from Kryo2 -> Kryo5?
>
> Best,
> Piotrek
>
> czw., 30 mar 2023 o 16:54 Alexis Sarda-Espinosa 
> napisał(a):
>
>> Hi Martijn,
>>
>> just to be sure, if all state-related classes use a POJO serializer, Kryo
>> will never come into play, right? Given FLINK-16686 [1], I wonder how many
>> users actually have jobs with Kryo and RocksDB, but even if there aren't
>> many, that still leaves those who don't use RocksDB for
>> checkpoints/savepoints.
>>
>> If Kryo were to stay in the Flink APIs in v1.X, is it impossible to let
>> users choose between v2/v5 jars by separating them like log4j2 jars?
>>
>> [1] https://issues.apache.org/jira/browse/FLINK-16686
>>
>> Regards,
>> Alexis.
>>
>> Am Do., 30. März 2023 um 14:26 Uhr schrieb Martijn Visser <
>> martijnvis...@apache.org>:
>>
>>> Hi all,
>>>
>>> I also saw a thread on this topic from Clayton Wohl [1] on this topic,
>>> which I'm including in this discussion thread to avoid that it gets lost.
>>>
>>> From my perspective, there's two main ways to get to Java 17:
>>>
>>> 1. The Flink community agrees that we upgrade Kryo to a later version,
>>> which means breaking all checkpoint/savepoint compatibility and releasing a
>>> Flink 2.0 with Java 17 support added and Java 8 and Flink Scala API support
>>> dropped. This is probably the quickest way, but would still mean that we
>>> expose Kryo in the Flink APIs, which is the main reason why we haven't been
>>> able to upgrade Kryo at all.
>>> 2. There's a contributor who makes a contribution that bumps Kryo, but
>>> either a) automagically reads in all old checkpoints/savepoints in using
>>> Kryo v2 and writes them to new snapshots using Kryo v5 (like is mentioned
>>> in the Kryo migration guide [2][3] or b) provides an offline tool that
>>> allows users that are interested in migrating their snapshots manually
>>> before starting from a newer version. That potentially could prevent the
>>> need to introduce a new Flink major version. In both scenarios, ideally the
>>> contributor would also help with avoiding the exposure of Kryo so that we
>>> will be in a better shape in the future.
>>>
>>> It would be good to get the opinion of the community for either of these
>>> two options, or potentially for another one that I haven't mentioned. If it
>>> appears that there's an overall agreement on the direction, I would propose
>>> that a FLIP gets created which describes the entire process.
>>>
>>> Looking forward to the thoughts of others, including the Users
>>> (therefore including the User ML).
>>>
>>> Best regards,
>>>
>>> Martijn
>>>
>>> [1]  https://lists.apache.org/thread/qcw8wy9dv8szxx9bh49nz7jnth22p1v2
>>> [2] https://lists.apache.org/thread/gv49jfkhmbshxdvzzozh017ntkst3sgq
>>> [3] https://github.com/EsotericSoftware/kryo/wiki/Migration-to-v5
>>>
>>> On Sun, Mar 19, 2023 at 8:16 AM Tamir Sagi 
>>> wrote:
>>>
 I agree, there are several options to mitigate the migration from v2 to
 v5.
 yet, Oracle roadmap is to end JDK 11 support in September this year.



 
 From: ConradJam 
 Sent: Thursday, March 16, 2023 4:36 AM
 To: dev@flink.apache.org 
 Subject: Re: [Discussion] - Release major Flink version to support JDK
 17 (LTS)

 EXTERNAL EMAIL



 Thanks for your start this discuss


 I have been tracking this problem for a long time, until I saw a
 conversation in ISSUSE a few days ago and learned that the Kryo version
 prob

Re: [SUMMARY] Flink 1.18 Release Sync 04/18/2023

2023-04-24 Thread Jing Ge
Hi Qingsheng,

Thanks for sharing the summary!

Best regards,
Jing

On Mon, Apr 24, 2023 at 1:50 PM Qingsheng Ren  wrote:

> Hi devs,
>
> I'd like to share some highlights in the 1.18 release sync on 04/18/2023
> (Sorry for the late summary!):
>
> - Feature list: @contributors please add your features to the list in
> release 1.18 wiki page [1] so that we could track the overall progress.
> - CI instabilities: owners of issues have already been pinged.
> - Version management: as 1.17 has already been released, 1.15 related
> resources like CIs and docker images will be removed in the coming week.
>
> The next release sync will be on May 2nd, 2023. Feel free and welcome to
> join us[2] !
>
> [1] https://cwiki.apache.org/confluence/display/FLINK/1.18+Release
> [2]
> https://us04web.zoom.us/j/79158702091?pwd=8CXPqxMzbabWkma5b0qFXI1IcLbxBh.1
>
> Best regards,
> Jing, Konstantin, Sergey and Qingsheng
>


Re: [VOTE] FLIP-288: Enable Dynamic Partition Discovery by Default in Kafka Source

2023-04-24 Thread Jing Ge
+1(binding)

Best regards,
Jing

On Tue, Apr 25, 2023 at 5:17 AM Rui Fan <1996fan...@gmail.com> wrote:

> +1 (binding)
>
> Best,
> Rui Fan
>
> On Tue, Apr 25, 2023 at 10:06 AM Biao Geng  wrote:
>
> > +1 (non-binding)
> > Best,
> > Biao Geng
> >
> > Martijn Visser  于2023年4月24日周一 20:20写道:
> >
> > > +1 (binding)
> > >
> > > On Mon, Apr 24, 2023 at 4:10 AM Feng Jin 
> wrote:
> > >
> > > > +1 (non-binding)
> > > >
> > > >
> > > > Best,
> > > > Feng
> > > >
> > > > On Mon, Apr 24, 2023 at 9:55 AM Hang Ruan 
> > > wrote:
> > > >
> > > > > +1 (non-binding)
> > > > >
> > > > > Best,
> > > > > Hang
> > > > >
> > > > > Paul Lam  于2023年4月23日周日 11:58写道:
> > > > >
> > > > > > +1 (non-binding)
> > > > > >
> > > > > > Best,
> > > > > > Paul Lam
> > > > > >
> > > > > > > 2023年4月23日 10:57,Shammon FY  写道:
> > > > > > >
> > > > > > > +1 (non-binding)
> > > > > > >
> > > > > > > Best,
> > > > > > > Shammon FY
> > > > > > >
> > > > > > > On Sun, Apr 23, 2023 at 10:35 AM Qingsheng Ren <
> > renqs...@gmail.com
> > > > > > > wrote:
> > > > > > >
> > > > > > >> Thanks for pushing this FLIP forward, Hongshun!
> > > > > > >>
> > > > > > >> +1 (binding)
> > > > > > >>
> > > > > > >> Best,
> > > > > > >> Qingsheng
> > > > > > >>
> > > > > > >> On Fri, Apr 21, 2023 at 2:52 PM Hongshun Wang <
> > > > > loserwang1...@gmail.com>
> > > > > > >> wrote:
> > > > > > >>
> > > > > > >>> Dear Flink Developers,
> > > > > > >>>
> > > > > > >>>
> > > > > > >>> Thank you for providing feedback on FLIP-288: Enable Dynamic
> > > > > Partition
> > > > > > >>> Discovery by Default in Kafka Source[1] on the discussion
> > > > thread[2].
> > > > > > >>>
> > > > > > >>> The goal of the FLIP is to enable partition discovery by
> > default
> > > > and
> > > > > > set
> > > > > > >>> EARLIEST offset strategy for later discovered partitions.
> > > > > > >>>
> > > > > > >>>
> > > > > > >>> I am initiating a vote for this FLIP. The vote will be open
> for
> > > at
> > > > > > least
> > > > > > >> 72
> > > > > > >>> hours, unless there is an objection or insufficient votes.
> > > > > > >>>
> > > > > > >>>
> > > > > > >>> [1]: [
> > > > > > >>>
> > > > > > >>>
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-288%3A+Enable+Dynamic+Partition+Discovery+by+Default+in+Kafka+Source](https://cwiki.apache.org/confluence/display/FLINK/FLIP-288%3A+Enable+Dynamic+Partition+Discovery+by+Default+in+Kafka+Source)
> 
> > <
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-288%3A+Enable+Dynamic+Partition+Discovery+by+Default+in+Kafka+Source%5D(https://cwiki.apache.org/confluence/display/FLINK/FLIP-288%3A+Enable+Dynamic+Partition+Discovery+by+Default+in+Kafka+Source)
> >
> > > <
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-288%3A+Enable+Dynamic+Partition+Discovery+by+Default+in+Kafka+Source%5D(https://cwiki.apache.org/confluence/display/FLINK/FLIP-288%3A+Enable+Dynamic+Partition+Discovery+by+Default+in+Kafka+Source)
> > >
> > > > <
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-288%3A+Enable+Dynamic+Partition+Discovery+by+Default+in+Kafka+Source%5D(https://cwiki.apache.org/confluence/display/FLINK/FLIP-288%3A+Enable+Dynamic+Partition+Discovery+by+Default+in+Kafka+Source)
> > > >
> > > > > <
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-288%3A+Enable+Dynamic+Partition+Discovery+by+Default+in+Kafka+Source%5D(https://cwiki.apache.org/confluence/display/FLINK/FLIP-288%3A+Enable+Dynamic+Partition+Discovery+by+Default+in+Kafka+Source)
> > > > >
> > > > > > <
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-288%3A+Enable+Dynamic+Partition+Discovery+by+Default+in+Kafka+Source%5D(https://cwiki.apache.org/confluence/display/FLINK/FLIP-288%3A+Enable+Dynamic+Partition+Discovery+by+Default+in+Kafka+Source)
> > > > > >
> > > > > > >> <
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-288%3A+Enable+Dynamic+Partition+Discovery+by+Default+in+Kafka+Source%5D(https://cwiki.apache.org/confluence/display/FLINK/FLIP-288%3A+Enable+Dynamic+Partition+Discovery+by+Default+in+Kafka+Source)
> > > > > > >
> > > > > > >>> <
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-288%3A+Enable+Dynamic+Partition+Discovery+by+Default+in+Kafka+Source%5D(https://cwiki.apache.org/confluence/display/FLINK/FLIP-288%3A+Enable+Dynamic+Partition+Discovery+by+Default+in+Kafka+Source)
> > > > > > <
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-288%3A+Enable+Dynamic+Partition+Discovery+by+Default+in+Kafka+Source%5D(https://cwiki.a

Re: [DISCUSS] Planning Flink 2.0

2023-04-25 Thread Jing Ge
Thanks Xingtong and Jark for kicking off and driving the discussion! It is
really good to see we finally start talking about Flink 2.0. There are so
many great ideas that require breaking API changes and so many tech debts
need to be cleaned up. With the Flink 2.0 ahead, we will be more fast-paced
to bring Flink to the next level. +1 for your proposal.

Best regards,
Jing



On Tue, Apr 25, 2023 at 3:55 PM Chesnay Schepler  wrote:

> This is definitely a good discussion so have.
>
> Some thoughts:
>
> One aspect that wasn't mentioned is what this release means going
> forward. I already waited a decade for 2.0; don't really want to wait
> another one to see Flink 3.0.
> We should discuss how regularly we will ship major releases from now on.
> Let's avoid again making breaking changes because we "gotta do it now
> because 3.0 isn't happening anytime soon".
> (e.g., every 2 years or something)
>
> Related to that we need to figure out how long 1.x will be supported and
> in what way (features+patches vs only patches).
>
> The timeline/branch/release-manager bits sound good to me.
>
>  > /There are also opinions that we should stay focused as much as
> //possible on the breaking changes only. Incremental /
> non-breaking//improvements and features, or anything that can be added
> in 2.x minor releases, should not block the 2.0 release./
>
> I would definitely agree with this. I'd much rather focus on resolving
> technical debt and setting us up for improvements later than trying to
> tackle both at the same time.
> The "marketing perspective" of having big key features to me just
> doesn't make sense considering what features we shipped with 1.x
> releases in the past years.
> If that means 2.0 comes along faster, then that's a bonus in my book.
> We may of course ship features (e.g., Java 17 which basically comes for
> free if we drop the Scala APIs), but they shouldn't be a focus.
>
>  > /With breaking API changes, we may need multiple 2.0-alpha/beta
> versions to collect feedback./
>
> Personally I wouldn't even aim for a big 2.0 release. I think that will
> become quiet a mess and very difficult to actually get feedback on.
> My thinking goes rather in the area of defining Milestone releases, each
> Milestone targeting specific changes.
> For example, one milestone could cleanup the REST API (+ X,Y,Z), while
> another removes deprecated APIs, etc etc.
> Depending on the scope we could iterate quite fast on these.
> (Note that I haven't thought this through yet from the dev workflow
> perspective, but it'd likely require longer-living feature branches)
>
> There are some clear benefits to this approach; if we'd drop deprecated
> APIs in M1 then we could already offers users a version of Flink that
> works with Java 17.
>
> On 25/04/2023 13:09, Xintong Song wrote:
> > Hi everyone,
> >
> > I'd like to start a discussion on planning for a Flink 2.0 release.
> >
> > AFAIK, in the past years this topic has been mentioned from time to time,
> > in mailing lists, jira tickets and offline discussions. However, few
> > concrete steps have been taken, due to the significant determination and
> > efforts it requires and distractions from other prioritized focuses.
> After
> > a series of offline discussions in the recent weeks, with folks mostly
> from
> > our team internally as well as a few from outside Alibaba / Ververica
> > (thanks for insights from Becket and Robert), we believe it's time to
> kick
> > this off in the community.
> >
> > Below are some of our thoughts about the 2.0 release. Looking forward to
> > your opinions and feedback.
> >
> >
> > ## Why plan for release 2.0?
> >
> >
> > Flink 1.0.0 was released in March 2016. In the past 7 years, many new
> > features have been added and the project has become different from what
> it
> > used to be. So what is Flink now? What will it become in the next 3-5
> > years? What do we think of Flink's position in the industry? We believe
> > it's time to rethink these questions, and draw a roadmap towards another
> > milestone, a milestone that worths a new major release.
> >
> >
> > In addition, we are still providing backwards compatibility (maybe not
> > perfectly but largely) with APIs that we designed and claimed stable 7
> > years ago. While such backwards compatibility helps users to stick with
> the
> > latest Flink releases more easily, it sometimes, and more and more over
> > time, also becomes a burden for maintenance and a limitation for new
> > features and improvements. It's probably time to have a comprehensive
> > review and clean-up over all the public APIs.
> >
> >
> > Furthermore, next year is the 10th year for Flink as an Apache project.
> > Flink joined the Apache incubator in April 2014, and became a top-level
> > project in December 2014. That makes 2024 a perfect time for bringing out
> > the release 2.0 milestone. And for such a major release, we'd expect it
> > takes one year or even longer to prepare for, which means we probably
>

Re: [DISCUSS] Preventing Mockito usage for the new code with Checkstyle

2023-04-25 Thread Jing Ge
This is a great idea, thanks for bringing this up. +1

Also +1 for Junit4. If I am not mistaken, it could only be done after the
Junit5 migration is done.

@Chesnay thanks for the hint. Do we have any doc about it? If not, it might
deserve one. WDYT?

Best regards,
Jing

On Wed, Apr 26, 2023 at 5:13 AM Lijie Wang  wrote:

> Thanks for driving this. +1 for the proposal.
>
> Can we also prevent Junit4 usage in new code by this way?Because currently
> we are aiming to migrate our codebase to JUnit 5.
>
> Best,
> Lijie
>
> Piotr Nowojski  于2023年4月25日周二 23:02写道:
>
> > Ok, thanks for the clarification.
> >
> > Piotrek
> >
> > wt., 25 kwi 2023 o 16:38 Chesnay Schepler 
> napisał(a):
> >
> > > The checkstyle rule would just ban certain imports.
> > > We'd add exclusions for all existing usages as we did when introducing
> > > other rules.
> > > So far we usually disabled checkstyle rules for a specific files.
> > >
> > > On 25/04/2023 16:34, Piotr Nowojski wrote:
> > > > +1 to the idea.
> > > >
> > > > How would this checkstyle rule work? Are you suggesting to start
> with a
> > > > number of exclusions? On what level will those exclusions be? Per
> file?
> > > Per
> > > > line?
> > > >
> > > > Best,
> > > > Piotrek
> > > >
> > > > wt., 25 kwi 2023 o 13:18 David Morávek  napisał(a):
> > > >
> > > >> Hi Everyone,
> > > >>
> > > >> A long time ago, the community decided not to use Mockito-based
> tests
> > > >> because those are hard to maintain. This is already baked in our
> Code
> > > Style
> > > >> and Quality Guide [1].
> > > >>
> > > >> Because we still have Mockito imported into the code base, it's very
> > > easy
> > > >> for newcomers to unconsciously introduce new tests violating the
> code
> > > style
> > > >> because they're unaware of the decision.
> > > >>
> > > >> I propose to prevent Mockito usage with a Checkstyle rule for a new
> > > code,
> > > >> which would eventually allow us to eliminate it. This could also
> > prevent
> > > >> some wasted work and unnecessary feedback cycles during reviews.
> > > >>
> > > >> WDYT?
> > > >>
> > > >> [1]
> > > >>
> > > >>
> > >
> >
> https://flink.apache.org/how-to-contribute/code-style-and-quality-common/#avoid-mockito---use-reusable-test-implementations
> > > >>
> > > >> Best,
> > > >> D.
> > > >>
> > >
> > >
> >
>


Re: [DISCUSS] FLIP 295: Support persistence of Catalog configuration and asynchronous registration

2023-04-27 Thread Jing Ge
Hi Feng,

Thanks for working on the FLIP. There are still some NIT issues in the FLIP
like:

1. Optional catalogStore has been used as CatalogStore
instead of Optional in the code example. It should be fine to use it as
pseudo code for now and update it after you submit the PR.
2. addCatalog(...) is still used somewhere in the rejected section which
should be persistContext(...) to keep it consistent.

Speaking of the conflict issues in the multi-instance scenarios, I am not
sure if this is the intended behaviour. If Map catalogs is
used as a cache, it should be invalid, once the related catalog has been
removed from the CatalogStore by another instance. Did I miss something?

Best regards,
Jing

On Thu, Apr 13, 2023 at 4:40 PM Feng Jin  wrote:

> Hi Jing,Shammon
> Thanks for your reply.
>
> @Jing
>
> > How about persistCatalog()?
> I think this is a good function name, I have updated it in the
> documentation.
>
> >Some common cache features should be implemented
> Thank you for the suggestion. If alternative 1 turns out to be more
> appropriate later, I will improve this part of the content.
>
> > As the above discussion moves forward, the option 2 solution looks more
> like a replacement of option 1
> Yes, after discussing with Shammon offline, we think that solution 2 might
> be more suitable and also avoid any inconsistency issues.
>
> > There are some inconsistent descriptions in the content.  Would you like
> to clean them up?
> I will do my best to improve the document and appreciate your suggestions.
>
>
>
> @Shammon
> > can you put the unselected option in `Rejected Alternatives`
> Sure, I have moved it to the `Rejected Alternatives`.
>
>
>
> Best
> Feng
>
>
>
> On Thu, Apr 13, 2023 at 8:52 AM Shammon FY  wrote:
>
> > Hi Feng
> >
> > Thanks for your update.
> >
> > I found there are two options in `Proposed Changes`, can you put the
> > unselected option in `Rejected Alternatives`? I think this may help us
> > better understand your proposal
> >
> >
> > Best,
> > Shammon FY
> >
> >
> > On Thu, Apr 13, 2023 at 4:49 AM Jing Ge 
> > wrote:
> >
> > > Hi Feng,
> > >
> > > Thanks for raising this FLIP. I am still confused after completely
> > reading
> > > the thread with following questions:
> > >
> > > 1. Naming confusion - registerCatalog() and addCatalog() have no big
> > > difference based on their names. One of them is responsible for data
> > > persistence. How about persistCatalog()?
> > > 2. As you mentioned that Map catalogs is used as a
> cache
> > > and catalogStore is used for data persistence. I would suggest
> describing
> > > their purpose conceptually and clearly in the FLIP. Some common cache
> > > features should be implemented, i.e. data in the cache and the store
> > should
> > > be consistent. Same Catalog instance should be found in the store and
> in
> > > the cache(either it has been initialized or it will be lazy
> initialized)
> > > for the same catalog name. The consistency will be taken care of while
> > > updating the catalog.
> > > 3. As the above discussion moves forward, the option 2 solution looks
> > more
> > > like a replacement of option 1, because, afaiu, issues mentioned
> > > previously with option 1 are not solved yet. Do you still want to
> propose
> > > both options and ask for suggestions for both of them?
> > > 4. After you updated the FLIP, there are some inconsistent descriptions
> > in
> > > the content.  Would you like to clean them up? Thanks!
> > >
> > > Best regards,
> > > Jing
> > >
> > >
> > > On Fri, Apr 7, 2023 at 9:24 AM Feng Jin  wrote:
> > >
> > > > hi Shammon
> > > >
> > > > Thank you for your response, and I completely agree with your point
> of
> > > > view.
> > > > Initially, I may have over complicated the whole issue. First and
> > > foremost,
> > > > we need to consider the persistence of the Catalog's Configuration.
> > > > If we only need to provide persistence for Catalog Configuration, we
> > can
> > > > add a toConfiguration method to the Catalog interface.
> > > > This method can convert a Catalog instance to a Map
> > > > properties, and the default implementation will throw an exception.
> > > >
> > > > public interface Catalog {
> > > >/**
> > > >* Returns a map containing the properties of the catalog object.
&g

Re: [Discussion] - Release major Flink version to support JDK 17 (LTS)

2023-04-27 Thread Jing Ge
Thanks Tamir for the information. According to the latest comment of the
task FLINK-24998, this bug should be gone while using the latest JDK 17. I
was wondering whether it means that there are no more issues to stop us
releasing a major Flink version to support Java 17? Did I miss something?

Best regards,
Jing

On Thu, Apr 27, 2023 at 8:18 AM Tamir Sagi 
wrote:

> More details about the JDK bug here
> https://bugs.openjdk.org/browse/JDK-8277529
>
> Related Jira ticket
> https://issues.apache.org/jira/browse/FLINK-24998
>
> ------
> *From:* Jing Ge via user 
> *Sent:* Monday, April 24, 2023 11:15 PM
> *To:* Chesnay Schepler 
> *Cc:* Piotr Nowojski ; Alexis Sarda-Espinosa <
> sarda.espin...@gmail.com>; Martijn Visser ;
> dev@flink.apache.org ; user 
> *Subject:* Re: [Discussion] - Release major Flink version to support JDK
> 17 (LTS)
>
>
> *EXTERNAL EMAIL*
>
>
> Thanks Chesnay for working on this. Would you like to share more info
> about the JDK bug?
>
> Best regards,
> Jing
>
> On Mon, Apr 24, 2023 at 11:39 AM Chesnay Schepler 
> wrote:
>
> As it turns out Kryo isn't a blocker; we ran into a JDK bug.
>
> On 31/03/2023 08:57, Chesnay Schepler wrote:
>
>
> https://github.com/EsotericSoftware/kryo/wiki/Migration-to-v5#migration-guide
>
> Kroy themselves state that v5 likely can't read v2 data.
>
> However, both versions can be on the classpath without classpath as v5
> offers a versioned artifact that includes the version in the package.
>
> It probably wouldn't be difficult to migrate a savepoint to Kryo v5,
> purely from a read/write perspective.
>
> The bigger question is how we expose this new Kryo version in the API. If
> we stick to the versioned jar we need to either duplicate all current
> Kryo-related APIs or find a better way to integrate other serialization
> stacks.
> On 30/03/2023 17:50, Piotr Nowojski wrote:
>
> Hey,
>
> > 1. The Flink community agrees that we upgrade Kryo to a later version,
> which means breaking all checkpoint/savepoint compatibility and releasing a
> Flink 2.0 with Java 17 support added and Java 8 and Flink Scala API support
> dropped. This is probably the quickest way, but would still mean that we
> expose Kryo in the Flink APIs, which is the main reason why we haven't been
> able to upgrade Kryo at all.
>
> This sounds pretty bad to me.
>
> Has anyone looked into what it would take to provide a smooth migration
> from Kryo2 -> Kryo5?
>
> Best,
> Piotrek
>
> czw., 30 mar 2023 o 16:54 Alexis Sarda-Espinosa 
> napisał(a):
>
> Hi Martijn,
>
> just to be sure, if all state-related classes use a POJO serializer, Kryo
> will never come into play, right? Given FLINK-16686 [1], I wonder how many
> users actually have jobs with Kryo and RocksDB, but even if there aren't
> many, that still leaves those who don't use RocksDB for
> checkpoints/savepoints.
>
> If Kryo were to stay in the Flink APIs in v1.X, is it impossible to let
> users choose between v2/v5 jars by separating them like log4j2 jars?
>
> [1] https://issues.apache.org/jira/browse/FLINK-16686
>
> Regards,
> Alexis.
>
> Am Do., 30. März 2023 um 14:26 Uhr schrieb Martijn Visser <
> martijnvis...@apache.org>:
>
> Hi all,
>
> I also saw a thread on this topic from Clayton Wohl [1] on this topic,
> which I'm including in this discussion thread to avoid that it gets lost.
>
> From my perspective, there's two main ways to get to Java 17:
>
> 1. The Flink community agrees that we upgrade Kryo to a later version,
> which means breaking all checkpoint/savepoint compatibility and releasing a
> Flink 2.0 with Java 17 support added and Java 8 and Flink Scala API support
> dropped. This is probably the quickest way, but would still mean that we
> expose Kryo in the Flink APIs, which is the main reason why we haven't been
> able to upgrade Kryo at all.
> 2. There's a contributor who makes a contribution that bumps Kryo, but
> either a) automagically reads in all old checkpoints/savepoints in using
> Kryo v2 and writes them to new snapshots using Kryo v5 (like is mentioned
> in the Kryo migration guide [2][3] or b) provides an offline tool that
> allows users that are interested in migrating their snapshots manually
> before starting from a newer version. That potentially could prevent the
> need to introduce a new Flink major version. In both scenarios, ideally the
> contributor would also help with avoiding the exposure of Kryo so that we
> will be in a better shape in the future.
>
> It would be good to get the opinion of the community for either of these
> two options, or potentially for another one t

Re: [DISCUSS] Planning Flink 2.0

2023-04-28 Thread Jing Ge
Hi,

As far as I am concerned, it would be great to build two top level
categories for Flink 2.0 release.

1. Future - mainly new big features or architecture improvements to achieve
visions like streamhouse. This makes the 2.0 release be the 2.0 instead of
1.x release, i.e. the real intention of 2.0 release with a significant
upgrade.
2. History - clean up tech debts, take care of histories, or whatever you
want to name it. The main goal of this category is to take the 2.0 release
opportunity (since we have strong intention to do it mentioned above) to
perform breaking changes, i.e. remove deprecated APIs and even modules,
upgrade APIs without thinking more about backwards compatibilities, etc.
This is kind of a buy-one-get-one benefit. In order to "get-one"(History),
we should, first of all, "buy-one"(Future).

Best regards,
Jing

On Fri, Apr 28, 2023 at 9:57 AM Xintong Song  wrote:

> Thanks all for the positive feedback.
>
> So far, it seems to me that the differences of opinions are mainly focused
> on whether we should include non-breaking features in the 2.0 release.
>
> There seems to be no objections to:
> 1. Starting to plan for the 2.0 release, with a rough timeline towards mid
> next year
> 2. Becket, Jark, Martijn and Xintong as the release managers
> 3. Initiating a project roadmap discussion
>
> I'll leave this discussion open for a bit longer. Also, next week is public
> holidays in China (I don't know if it's also in other countries). After the
> holidays and if there's no objections, we'll assemble the release
> management team as discussed, and try to figure out a proper way for the
> roadmap discussion next.
>
> Best,
>
> Xintong
>
>
>
> On Fri, Apr 28, 2023 at 3:43 PM Xintong Song 
> wrote:
>
> > @Weike,
> >
> > Thanks for the suggestion. I think it makes sense to provide a longer
> > supporting period for the last 1.x release.
> >
> > @David,
> >
> > I can see the benefit of focusing on breaking changes and API clean-ups
> > only, so the community can be more focused and possibly deliver the 2.0
> > release earlier. However, I can also understand that it might be
> > disappointing for some people (admittedly, a little bit for me as well)
> if
> > the 2.0 release contains only clean-ups but no other significant
> user-aware
> > improvements. My personal opinion would be not to prevent people from
> > trying to get new features into this release, but would not block the
> > release of such features unless breaking changes are involved.
> >
> > @Martijn,
> >
> > Thanks for sharing your ideas. Glad to see that things you've listed have
> > a lot in common with what we put in our list. I believe that's a good
> > signal that we share similar opinions on what is good and important for
> the
> > project and the release.
> >
> > @Sai,
> >
> > Welcome to the community. And thanks for offering helps.
> >
> > At the moment, this discussion is only happening in this mailing list. We
> > may consider setting up online meetings or dedicated slack channels in
> > future. And if so, the information will also be posted in the mailing
> list.
> >
> > Best,
> >
> > Xintong
> >
> >
> >
> > On Fri, Apr 28, 2023 at 2:19 PM Saichandrasekar TM <
> > saichandrase...@gmail.com> wrote:
> >
> >> Hi All,
> >>
> >> Awesome...I see this as a great opportunity for newcomers like me to
> >> contribute.
> >>
> >> Is this discussion happening in a slack or discord forum too? If so, pls
> >> include me.
> >>
> >> Thanks,
> >> Sai
> >>
> >> On Fri, Apr 28, 2023 at 2:55 AM Martijn Visser <
> martijnvis...@apache.org>
> >> wrote:
> >>
> >> > Hi all,
> >> >
> >> > I think the proposal is a good starting point. We should aim to make
> >> Flink
> >> > a unified data processing, cloud friendly / cloud native technology,
> >> with
> >> > proper low-level and high-level interfaces (DataStream API, Table API,
> >> > SQL). I think it would make a lot of sense that we write down a vision
> >> for
> >> > Flink for the long term. That would also mean sharing and discussing
> >> more
> >> > insights and having conversations around some of the long-term
> direction
> >> > from the proposal.
> >> >
> >> > In order to achieve that vision, I believe that we need a Flink 2.0
> >> which I
> >> > consider a long overdue clean-up. That version should be the
> foundation
> >> for
> >> > Flink that allows the above mentioned vision to become actual
> proposals
> >> and
> >> > implementations.
> >> >
> >> > As a foundation in Flink 2.0, I would be inclined to say it should be:
> >> >
> >> > - Remove all deprecated APIs, including the DataSet API, Scala API,
> >> > Queryable State, legacy Source and Sink implementations, legacy SQL
> >> > functions etc.
> >> > - Add support for Java 17 and 21, make 17 the default (given that the
> >> next
> >> > Java LTS, 21, is released in September this year and the timeline is
> >> set of
> >> > 2024)
> >> > - Drop support for Java 8 and 11
> >> > - Refactor the configuration layer
> >> > - Refactor the DataSt

Re: Re: Re: [DISCUSS] FLIP-305: Support atomic for CREATE TABLE AS SELECT(CTAS) statement

2023-04-28 Thread Jing Ge
Hi Mang,

Boundedness and execution modes are two orthogonal concepts. Since atomic
CTAS will be only supported for bounded data, which means it does not
depend on the execution modes. I was wondering if it is possible to only
provide (or call) twoPhaseCreateTable for bounded data (in both streaming
and batch mode) and let unbounded data use the non-atomic CTAS? In this
way, we could avoid the selector argument code smell.

Best regards,
Jing

On Tue, Apr 25, 2023 at 10:04 AM Mang Zhang  wrote:

> Hi Jing,
> Yes, the atomic CTAS will be only supported for bounded data, but the
> execution modes can be stream or batch.
> I introduced the isStreamingMode parameter in the twoPhaseCreateTable API
> to make it easier for users to provide different levels of atomicity
> implementation depending on the capabilities of the backend service.
> For example, in the case of data synchronization, it is common to run the
> job using Stream mode, but also expect the data to be visible to the user
> only after the synchronization is complete.
> flink cdc's synchronized data scenario, where the user must first write to
> a temporary table and then manually rename it to the final table;
> unfriendly to user experience.
> Developers providing twoPhaseCreateTable capability in Catalog can decide
> whether to support atomicity based on the execution mode, or they can
> choose to provide lightweight atomicity support in Stream mode, such as
> automatically renaming the table name for the user.
>
>
>
> --
>
> Best regards,
>
> Mang Zhang
>
>
>
> At 2023-04-24 15:41:31, "Jing Ge"  wrote:
> >Hi Mang,
> >
> >
> >
> >Thanks for clarifying it. I am trying to understand your thoughts. Do you
> >actually mean the boundedness[1] instead of the execution modes[2]? I.e.
> >the atomic CTAS will be only supported for bounded data.
> >
> >
> >
> >Best regards,
> >
> >Jing
> >
> >
> >
> >[1] https://flink.apache.org/what-is-flink/flink-architecture/
> >
> >[2]
> >https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/execution_mode/#execution-mode-batchstreaming
> >
> >On Wed, Apr 19, 2023 at 9:14 AM Mang Zhang  wrote:
> >
> >> hi, Jing
> >>
> >> Thank you for your reply.
> >>
> >> >1. It looks like you found another way to design the atomic CTAS with new
> >> >serializable TwoPhaseCatalogTable instead of making Catalog serializable 
> >> >as
> >> >described in FLIP-218. Did I understand correctly?
> >> Yes, when I was implementing the FLIP-218 solution, I encountered problems 
> >> with Catalog/CatalogTable serialization deserialization, for example, 
> >> after deserialization CatalogTable could not be converted to Hive Table. 
> >> Also, Catalog serialization is still a heavy operation, but it may not 
> >> actually be necessary, we just need Create Table.
> >> Therefore, the TwoPhaseCatalogTable program is proposed, which also 
> >> facilitates the implementation of the subsequent data lake, ReplaceTable 
> >> and other functions.
> >>
> >> >2. I am a little bit confused about the isStreamingMode parameter of
> >> >Catalog#twoPhaseCreateTable(...), since it is the selector argument(code
> >> >smell) we should commonly avoid in the public interface. According to the
> >> >FLIP,  isStreamingMode will be used by the Catalog to determine whether to
> >> >support atomic or not. With this selector argument, there will be two
> >> >different logics built within one method and it is hard to follow without
> >> >reading the code or the doc carefully(another concern is to keep the doc
> >> >and code alway be consistent) i.e. sometimes there will be no difference 
> >> >by
> >> >using true/false isStreamingMode, sometimes they are quite different -
> >> >atomic vs. non-atomic. Another question is, before we call
> >> >Catalog#twoPhaseCreateTable(...), we have to know the value of
> >> >isStreamingMode. In case only non-atomic is supported for streaming mode,
> >> >we could just follow FLIP-218 instead of (twistedly) calling
> >> >Catalog#twoPhaseCreateTable(...) with a false isStreamingMode. Did I miss
> >> >anything here?
> >>
> >> Here's what I think about this issue, atomic CTAS wants to be the default
> >> behavior and only fall back to non-atomic CTAS if it's completely
> >> unattainable. Atomic CTAS will bring a better experience to users.
> >> Flink is already a stream batch unified e

Re: [DISCUSS] EncodingFormat and DecondingFormat provide copy API

2023-04-28 Thread Jing Ge
Hi Tanjialiang,

Like we discussed in another thread, please feel free to create a FLIP and
start further discussion. In case you need any access right to the wiki
page, please let me know, thanks.

Best regards,
Jing


On Sun, Apr 23, 2023 at 4:23 AM tanjialiang  wrote:

> Hi, devs.
> Do anyone has any question about this discussion? I'm looking forwards to
> your feedback.
>
>
>
> Best regards,
> tanjialiang.
>
>
>  Replied Message 
> | From | tanjialiang |
> | Date | 4/13/2023 10:05 |
> | To | dev@flink.apache.org |
> | Subject | [DISCUSS] EncodingFormat and DecondingFormat provide copy API |
> Hi, devs.
>
>
> I'd like to start a discussion about to EncodingFormat and DecondingFormat
> provide copy API, which relate to FLINK-31686 [1].
>
>
> Current, DecodingFormat doesn't support copy(), which makes the
> DecodingFormat resued after filter/projection is pushed down. The
> EncodingFormat has the same problem if class implements
> EncodingFormat#applyWritableMetadata(). So I think EncodingFormat and
> DecodingFormat need to provide a copy function, and it should be a deep
> copy if format implements
> DecodingFormat#applyReadableMetadata/EncodingFormat#applyWritableMetadata/BulkDecodingFormat#applyFilters.
>
>
>
> Looking forwards to your feedback.
>
>
> [1]: [https://issues.apache.org/jira/browse/FLINK-31686]
>
>
> Best regards,
> tanjialiang
>
>
>
>


Re: [DISCUSS] FLIP 295: Support persistence of Catalog configuration and asynchronous registration

2023-04-30 Thread Jing Ge
Hi Feng,

There are still many places contain inconsistent content, e.g.

1. "Asynchronous registration" is still used.
2. The java comment of the method registerCatalog(String catalogName,
Catalog catalog) in CatalogManager does not tell what the method is doing.


There might be more such issues. I would suggest you completely walk
through the FLIP again and fix those issues.

About the inMemoryCatalogStore, do you mean that you will build the cache
functionality in the CatalogStore? This is a very different design concept
from what the current FLIP described. If I am not mistaken, with the
current FLIP design, CatalogManager could work without Optional
CatalogStore being configured. That is the reason why I mentioned in the
last email that the example code wrt the Optional CatalogStore is not
correct.

Best regards,
Jing

On Thu, Apr 27, 2023 at 3:55 PM Feng Jin  wrote:

> Hi Jing
>
>
> > There are still some NIT issues in the FLIP
>
> Thank you very much for the careful review. I have already made the
> relevant changes.
>
>
> >  Speaking of the conflict issues in the multi-instance scenarios, I am
> not
> sure if this is the intended behaviour
>
> Currently, there are conflicts in multiple scenarios with the current
> design. I am thinking whether we should remove 'Map' and
> make Cache the default behavior of InMemoryCatalogStore. This way, users
> can implement their own CatalogStore to achieve multi-instance
> non-conflicting scenarios. What do you think?
>
>
>
> Best,
> Feng
>
> On Thu, Apr 27, 2023 at 9:03 PM Jing Ge 
> wrote:
>
> > Hi Feng,
> >
> > Thanks for working on the FLIP. There are still some NIT issues in the
> FLIP
> > like:
> >
> > 1. Optional catalogStore has been used as CatalogStore
> > instead of Optional in the code example. It should be fine to use it as
> > pseudo code for now and update it after you submit the PR.
> > 2. addCatalog(...) is still used somewhere in the rejected section which
> > should be persistContext(...) to keep it consistent.
> >
> > Speaking of the conflict issues in the multi-instance scenarios, I am not
> > sure if this is the intended behaviour. If Map catalogs
> is
> > used as a cache, it should be invalid, once the related catalog has been
> > removed from the CatalogStore by another instance. Did I miss something?
> >
> > Best regards,
> > Jing
> >
> > On Thu, Apr 13, 2023 at 4:40 PM Feng Jin  wrote:
> >
> > > Hi Jing,Shammon
> > > Thanks for your reply.
> > >
> > > @Jing
> > >
> > > > How about persistCatalog()?
> > > I think this is a good function name, I have updated it in the
> > > documentation.
> > >
> > > >Some common cache features should be implemented
> > > Thank you for the suggestion. If alternative 1 turns out to be more
> > > appropriate later, I will improve this part of the content.
> > >
> > > > As the above discussion moves forward, the option 2 solution looks
> more
> > > like a replacement of option 1
> > > Yes, after discussing with Shammon offline, we think that solution 2
> > might
> > > be more suitable and also avoid any inconsistency issues.
> > >
> > > > There are some inconsistent descriptions in the content.  Would you
> > like
> > > to clean them up?
> > > I will do my best to improve the document and appreciate your
> > suggestions.
> > >
> > >
> > >
> > > @Shammon
> > > > can you put the unselected option in `Rejected Alternatives`
> > > Sure, I have moved it to the `Rejected Alternatives`.
> > >
> > >
> > >
> > > Best
> > > Feng
> > >
> > >
> > >
> > > On Thu, Apr 13, 2023 at 8:52 AM Shammon FY  wrote:
> > >
> > > > Hi Feng
> > > >
> > > > Thanks for your update.
> > > >
> > > > I found there are two options in `Proposed Changes`, can you put the
> > > > unselected option in `Rejected Alternatives`? I think this may help
> us
> > > > better understand your proposal
> > > >
> > > >
> > > > Best,
> > > > Shammon FY
> > > >
> > > >
> > > > On Thu, Apr 13, 2023 at 4:49 AM Jing Ge 
> > > > wrote:
> > > >
> > > > > Hi Feng,
> > > > >
> > > > > Thanks for raising this FLIP. I am still confused after completely
> > > > reading
> > > > > the thread with follo

Re: [DISCUSS] Preventing Mockito usage for the new code with Checkstyle

2023-04-30 Thread Jing Ge
Thanks @Panagiotis for the hint! Does it mean that those suppressions need
to be cleaned up continuously while we move forward with Junit5
migration(extra guideline is required), even if regex has been used? Or we
just leave them as they are and clean them up in one shot after every
junit4 has been migrated to junit5.

Best regards,
Jing

On Wed, Apr 26, 2023 at 9:02 AM Panagiotis Garefalakis 
wrote:

> Thanks for bringing this up!  +1 for the proposal
>
> @Jing Ge -- we don't necessarily need to completely migrate to Junit5 (even
> though it would be ideal).
> We could introduce the checkstyle rule and add suppressions for the
> existing problematic paths (as we do today for other rules e.g.,
> AvoidStarImport)
>
> Cheers,
> Panagiotis
>
> On Tue, Apr 25, 2023 at 11:48 PM Weihua Hu  wrote:
>
> > Thanks for driving this.
> >
> > +1 for Mockito and Junit4.
> >
> > A clarity checkstyle will be of great help to new developers.
> >
> > Best,
> > Weihua
> >
> >
> > On Wed, Apr 26, 2023 at 1:47 PM Jing Ge 
> > wrote:
> >
> > > This is a great idea, thanks for bringing this up. +1
> > >
> > > Also +1 for Junit4. If I am not mistaken, it could only be done after
> the
> > > Junit5 migration is done.
> > >
> > > @Chesnay thanks for the hint. Do we have any doc about it? If not, it
> > might
> > > deserve one. WDYT?
> > >
> > > Best regards,
> > > Jing
> > >
> > > On Wed, Apr 26, 2023 at 5:13 AM Lijie Wang 
> > > wrote:
> > >
> > > > Thanks for driving this. +1 for the proposal.
> > > >
> > > > Can we also prevent Junit4 usage in new code by this way?Because
> > > currently
> > > > we are aiming to migrate our codebase to JUnit 5.
> > > >
> > > > Best,
> > > > Lijie
> > > >
> > > > Piotr Nowojski  于2023年4月25日周二 23:02写道:
> > > >
> > > > > Ok, thanks for the clarification.
> > > > >
> > > > > Piotrek
> > > > >
> > > > > wt., 25 kwi 2023 o 16:38 Chesnay Schepler 
> > > > napisał(a):
> > > > >
> > > > > > The checkstyle rule would just ban certain imports.
> > > > > > We'd add exclusions for all existing usages as we did when
> > > introducing
> > > > > > other rules.
> > > > > > So far we usually disabled checkstyle rules for a specific files.
> > > > > >
> > > > > > On 25/04/2023 16:34, Piotr Nowojski wrote:
> > > > > > > +1 to the idea.
> > > > > > >
> > > > > > > How would this checkstyle rule work? Are you suggesting to
> start
> > > > with a
> > > > > > > number of exclusions? On what level will those exclusions be?
> Per
> > > > file?
> > > > > > Per
> > > > > > > line?
> > > > > > >
> > > > > > > Best,
> > > > > > > Piotrek
> > > > > > >
> > > > > > > wt., 25 kwi 2023 o 13:18 David Morávek 
> > > napisał(a):
> > > > > > >
> > > > > > >> Hi Everyone,
> > > > > > >>
> > > > > > >> A long time ago, the community decided not to use
> Mockito-based
> > > > tests
> > > > > > >> because those are hard to maintain. This is already baked in
> our
> > > > Code
> > > > > > Style
> > > > > > >> and Quality Guide [1].
> > > > > > >>
> > > > > > >> Because we still have Mockito imported into the code base,
> it's
> > > very
> > > > > > easy
> > > > > > >> for newcomers to unconsciously introduce new tests violating
> the
> > > > code
> > > > > > style
> > > > > > >> because they're unaware of the decision.
> > > > > > >>
> > > > > > >> I propose to prevent Mockito usage with a Checkstyle rule for
> a
> > > new
> > > > > > code,
> > > > > > >> which would eventually allow us to eliminate it. This could
> also
> > > > > prevent
> > > > > > >> some wasted work and unnecessary feedback cycles during
> reviews.
> > > > > > >>
> > > > > > >> WDYT?
> > > > > > >>
> > > > > > >> [1]
> > > > > > >>
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://flink.apache.org/how-to-contribute/code-style-and-quality-common/#avoid-mockito---use-reusable-test-implementations
> > > > > > >>
> > > > > > >> Best,
> > > > > > >> D.
> > > > > > >>
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: [DISCUSS] Preventing Mockito usage for the new code with Checkstyle

2023-05-02 Thread Jing Ge
Hi Panagiotis,

afaiu, those modules left are big and complex modules that still need a lot
of effort to migrate to junit5. Overall, agree with you to enable
suppressions.

Best regards,
Jing


On Mon, May 1, 2023 at 7:57 AM Panagiotis Garefalakis 
wrote:

> Hey Jing,
>
> That's basically up to us to decide -- maybe a continuous approach is
> safer? (tests with junit4 behaviour won't be mergeable if we enable rules
> immediately)
> That said, looks like the migration work is almost there
> <https://issues.apache.org/jira/browse/FLINK-25325> -- with some effort we
> could minimize the need for suppressions altogether.
>
> Cheers,
> Panagiotis
>
> On Sun, Apr 30, 2023 at 6:16 AM Jing Ge 
> wrote:
>
> > Thanks @Panagiotis for the hint! Does it mean that those suppressions
> need
> > to be cleaned up continuously while we move forward with Junit5
> > migration(extra guideline is required), even if regex has been used? Or
> we
> > just leave them as they are and clean them up in one shot after every
> > junit4 has been migrated to junit5.
> >
> > Best regards,
> > Jing
> >
> > On Wed, Apr 26, 2023 at 9:02 AM Panagiotis Garefalakis <
> pga...@apache.org>
> > wrote:
> >
> > > Thanks for bringing this up!  +1 for the proposal
> > >
> > > @Jing Ge -- we don't necessarily need to completely migrate to Junit5
> > (even
> > > though it would be ideal).
> > > We could introduce the checkstyle rule and add suppressions for the
> > > existing problematic paths (as we do today for other rules e.g.,
> > > AvoidStarImport)
> > >
> > > Cheers,
> > > Panagiotis
> > >
> > > On Tue, Apr 25, 2023 at 11:48 PM Weihua Hu 
> > wrote:
> > >
> > > > Thanks for driving this.
> > > >
> > > > +1 for Mockito and Junit4.
> > > >
> > > > A clarity checkstyle will be of great help to new developers.
> > > >
> > > > Best,
> > > > Weihua
> > > >
> > > >
> > > > On Wed, Apr 26, 2023 at 1:47 PM Jing Ge 
> > > > wrote:
> > > >
> > > > > This is a great idea, thanks for bringing this up. +1
> > > > >
> > > > > Also +1 for Junit4. If I am not mistaken, it could only be done
> after
> > > the
> > > > > Junit5 migration is done.
> > > > >
> > > > > @Chesnay thanks for the hint. Do we have any doc about it? If not,
> it
> > > > might
> > > > > deserve one. WDYT?
> > > > >
> > > > > Best regards,
> > > > > Jing
> > > > >
> > > > > On Wed, Apr 26, 2023 at 5:13 AM Lijie Wang <
> wangdachui9...@gmail.com
> > >
> > > > > wrote:
> > > > >
> > > > > > Thanks for driving this. +1 for the proposal.
> > > > > >
> > > > > > Can we also prevent Junit4 usage in new code by this way?Because
> > > > > currently
> > > > > > we are aiming to migrate our codebase to JUnit 5.
> > > > > >
> > > > > > Best,
> > > > > > Lijie
> > > > > >
> > > > > > Piotr Nowojski  于2023年4月25日周二 23:02写道:
> > > > > >
> > > > > > > Ok, thanks for the clarification.
> > > > > > >
> > > > > > > Piotrek
> > > > > > >
> > > > > > > wt., 25 kwi 2023 o 16:38 Chesnay Schepler 
> > > > > > napisał(a):
> > > > > > >
> > > > > > > > The checkstyle rule would just ban certain imports.
> > > > > > > > We'd add exclusions for all existing usages as we did when
> > > > > introducing
> > > > > > > > other rules.
> > > > > > > > So far we usually disabled checkstyle rules for a specific
> > files.
> > > > > > > >
> > > > > > > > On 25/04/2023 16:34, Piotr Nowojski wrote:
> > > > > > > > > +1 to the idea.
> > > > > > > > >
> > > > > > > > > How would this checkstyle rule work? Are you suggesting to
> > > start
> > > > > > with a
> > > > > > > > > number of exclusions? On what level will those exclusions
> be?
> > > Per
&g

[SUMMARY] Flink 1.18 Release Sync 05/02/2023

2023-05-02 Thread Jing Ge
Hi devs,

I'd like to share highlights synced on 05/02/2023

- Feature list: many features have been added to the list[1]. Double
checked new FLIPs that passed votes are already in the list.
- CI instabilities: First of all, there are currently 4 Blocker issues. 3
of them have already been assigned. The only one without an assignee[2]
belongs to the external JDBC connector repo and has limited impact on Flink
main repo. A related PR could be found at [3]. It would be great if anyone
could help review the PR. Thanks! Second, there are totally 52 Critical
issues[4] that have 'test-stability' label and still have no assignee. Some
of them are old. We will focus on issues that have updates in the past 30
days.


Best regards,
Qingsheng, Sergey, Konstantin, and Jing


[1] https://cwiki.apache.org/confluence/display/FLINK/1.18+Release
[2] https://issues.apache.org/jira/browse/FLINK-31770
[3] https://github.com/apache/flink-connector-jdbc/pull/22
[4] project = FLINK AND resolution = Unresolved AND priority = Critical AND
assignee is EMPTY  AND labels = ‘test-stability’ ORDER BY updated DESC,
priority DESC


Re: [DISCUSS] FLIP 295: Support persistence of Catalog configuration and asynchronous registration

2023-05-06 Thread Jing Ge
Hi Feng,

Thanks for improving the FLIP. It looks good to me. We could still
reconsider in the future how to provide more common built-in cache
functionality in CatalogManager so that not every CatalogSotre
implementation has to take care of it.

Best regards,
Jing

On Thu, May 4, 2023 at 1:47 PM Feng Jin  wrote:

> Hi Jing,
>
> Thanks for your reply.
>
> >  There might be more such issues. I would suggest you completely walk
> through the FLIP again and fix those issues
>
> I am very sorry for my carelessness and at the same time, I greatly
> appreciate your careful review. I have thoroughly checked the entire FLIP
> and made corrections to these issues.
>
>
> >  If I am not mistaken, with the  current FLIP design, CatalogManager
> could work without Optional  CatalogStore being configured.
>
> Yes, in the original design, CatalogStore was not necessary because
> CatalogManager used Map catalogs to store catalog
> instances.
> However, this caused inconsistency issues. Therefore, I modified this part
> of the design and removed Map catalogs from
> CatalogManager.
>  At the same time, InMemoryCatalog will serve as the default CatalogStore
> to save catalogs in memory and replace the functionality of
> Mapcatalogs.
> The previous plan that kept Mapcatalogs has been moved to
> Rejected Alternatives.
>
>
>
> Best,
> Feng
>
> On Sun, Apr 30, 2023 at 9:03 PM Jing Ge 
> wrote:
>
> > Hi Feng,
> >
> > There are still many places contain inconsistent content, e.g.
> >
> > 1. "Asynchronous registration" is still used.
> > 2. The java comment of the method registerCatalog(String catalogName,
> > Catalog catalog) in CatalogManager does not tell what the method is
> doing.
> >
> >
> > There might be more such issues. I would suggest you completely walk
> > through the FLIP again and fix those issues.
> >
> > About the inMemoryCatalogStore, do you mean that you will build the cache
> > functionality in the CatalogStore? This is a very different design
> concept
> > from what the current FLIP described. If I am not mistaken, with the
> > current FLIP design, CatalogManager could work without Optional
> > CatalogStore being configured. That is the reason why I mentioned in the
> > last email that the example code wrt the Optional CatalogStore is not
> > correct.
> >
> > Best regards,
> > Jing
> >
> > On Thu, Apr 27, 2023 at 3:55 PM Feng Jin  wrote:
> >
> > > Hi Jing
> > >
> > >
> > > > There are still some NIT issues in the FLIP
> > >
> > > Thank you very much for the careful review. I have already made the
> > > relevant changes.
> > >
> > >
> > > >  Speaking of the conflict issues in the multi-instance scenarios, I
> am
> > > not
> > > sure if this is the intended behaviour
> > >
> > > Currently, there are conflicts in multiple scenarios with the current
> > > design. I am thinking whether we should remove 'Map'
> and
> > > make Cache the default behavior of InMemoryCatalogStore. This way,
> users
> > > can implement their own CatalogStore to achieve multi-instance
> > > non-conflicting scenarios. What do you think?
> > >
> > >
> > >
> > > Best,
> > > Feng
> > >
> > > On Thu, Apr 27, 2023 at 9:03 PM Jing Ge 
> > > wrote:
> > >
> > > > Hi Feng,
> > > >
> > > > Thanks for working on the FLIP. There are still some NIT issues in
> the
> > > FLIP
> > > > like:
> > > >
> > > > 1. Optional catalogStore has been used as CatalogStore
> > > > instead of Optional in the code example. It should be fine to use it
> as
> > > > pseudo code for now and update it after you submit the PR.
> > > > 2. addCatalog(...) is still used somewhere in the rejected section
> > which
> > > > should be persistContext(...) to keep it consistent.
> > > >
> > > > Speaking of the conflict issues in the multi-instance scenarios, I am
> > not
> > > > sure if this is the intended behaviour. If Map
> > catalogs
> > > is
> > > > used as a cache, it should be invalid, once the related catalog has
> > been
> > > > removed from the CatalogStore by another instance. Did I miss
> > something?
> > > >
> > > > Best regards,
> > > > Jing
> > > >
> > > > On Thu, Apr 13, 2023 at 4:40 PM Feng Jin 
> > wrote:
> > > >
> >

Re: [VOTE] Release flink-connector-rabbitmq v3.0.1, release candidate #1

2023-05-06 Thread Jing Ge
+1

- built the source
- verified signature
- verified hash
- contains no compiled binaries
- checked tag
- web PR looks good

Best regards,
Jing


On Fri, May 5, 2023 at 9:14 AM Khanh Vu  wrote:

> Sorry, the above report is supposed for flink-connector-gpc-pubsub-3.0.1
>
> -
> Here is the check for this one:
>
> +1 (non-binding)
>
> - Verified sha512 checksum matches file archive.
> - Verified file archive is signed and signature is authenticated.
> - Verified no binaries exist in the source archive.
> - Verified source archive is consistent with Github source code with tag
> v3.0.1-rc1, at commit 9827e71662c8f155cda5efe5ebbac804fd0fd8e2
> - Source built successfully with maven.
> - (No end to end tests run for this connector)
>
> Best regards,
> Khanh Vu
>
>
> On Fri, May 5, 2023 at 7:55 AM Khanh Vu  wrote:
>
> > +1 (non-binding)
> >
> > - Verified sha512 checksum matches file archive.
> > - Verified file archive is signed and signature is authorized.
> > - Verified no binaries exist in the source archive.
> > - Verified source archive is consistent with Github source code with
> > tag v3.0.1-rc1, at commit 73e56edb2aa4513f6a73dc071545fb2508fd2d44
> > - Source built successfully with maven.
> > - Executed end to end tests successfully for flink versions: 1.15.4,
> > 1.16.1, 1.17.0
> >
> > Best regards,
> > Khanh Vu
> >
> >
> > On Thu, May 4, 2023 at 3:47 AM Leonard Xu  wrote:
> >
> >>
> >> +1 (binding)
> >>
> >> - built from source code succeeded
> >> - verified signatures
> >> - verified hashsums
> >> - checked Github release tag
> >> - reviewed the web PR
> >>
> >> Best,
> >> Leonard
> >>
> >>
> >> 2023年4月20日 上午3:29,Martijn Visser  写道:
> >>
> >> +1 (binding)
> >>
> >> - Validated hashes
> >> - Verified signature
> >> - Verified that no binaries exist in the source archive
> >> - Build the source with Maven
> >> - Verified licenses
> >> - Verified web PRs
> >>
> >> On Mon, Apr 17, 2023 at 7:00 PM Ryan Skraba
>  >> >
> >> wrote:
> >>
> >> Hello!  +1 (non-binding)
> >>
> >> I've validated the source for the RC1:
> >> flink-connector-rabbitmq-3.0.1-src.tgz
> >> * The sha512 checksum is OK.
> >> * The source file is signed correctly.
> >> * The signature A5F3BCE4CBE993573EC5966A65321B8382B219AF is found in the
> >> KEYS file, and on https://keys.openpgp.org
> >> * The source file is consistent with the Github tag v3.0.1-rc1, which
> >> corresponds to commit 9827e71662c8f155cda5efe5ebbac804fd0fd8e2
> >>   - The files explicitly excluded by create_pristine_sources (such as
> >> .gitignore and the submodule tools/releasing/shared) are not present.
> >> * Has a LICENSE file and a NOTICE file.  The sql-connector has a
> >> NOTICE file for bundled artifacts.
> >> * Does not contain any compiled binaries.
> >>
> >> * The sources can be compiled and tests pass with flink.version 1.17.0
> and
> >> flink.version 1.16.1
> >>
> >> * Nexus has three staged artifact ids for 3.0.1-1.16 and 3.0.1-1.17
> >> - flink-connector-rabbitmq-parent (only .pom)
> >> - flink-connector-rabbitmq (.jar, -sources.jar, -javadoc.jar and .pom)
> >> - flink-sql-connector-rabbitmq (.jar, -sources.jar and .pom)
> >> * All 16 files have been signed with the same key as above, and have
> >> correct sha1 and md5 checksums.
> >>
> >> I didn't run any additional smoke tests other than the integration test
> >> cases.
> >>
> >> A couple minor points, but nothing that would block this release.
> >>
> >> - like flink-connector-gcp-pubsub-parent, the
> >> flink-connector-rabbitmq-parent:3.0.1-1.17 pom artifact has the
> >> flink.version set to 1.16.0, which might be confusing.
> >> - the NOTICE file for sql-connector has the wrong year.
> >>
> >> All my best and thanks for the release.
> >>
> >> Ryan
> >>
> >>
> >> On Thu, Apr 13, 2023 at 4:45 PM Martijn Visser <
> martijnvis...@apache.org>
> >> wrote:
> >>
> >> Hi everyone,
> >> Please review and vote on the release candidate #1 for the version
> 3.0.1,
> >> as follows:
> >> [ ] +1, Approve the release
> >> [ ] -1, Do not approve the release (please provide specific comments)
> >>
> >> This version is compatible with Flink 1.16.x and Flink 1.17.x
> >>
> >> The complete staging area is available for your review, which includes:
> >> * JIRA release notes [1],
> >> * the official Apache source release to be deployed to dist.apache.org
> >> [2],
> >> which are signed with the key with fingerprint
> >> A5F3BCE4CBE993573EC5966A65321B8382B219AF [3],
> >> * all artifacts to be deployed to the Maven Central Repository [4],
> >> * source code tag v3.0.1-rc1 [5],
> >> * website pull request listing the new release [6].
> >>
> >> The vote will be open for at least 72 hours. It is adopted by majority
> >> approval, with at least 3 PMC affirmative votes.
> >>
> >> Thanks,
> >> Release Manager
> >>
> >> [1]
> >>
> >>
> >>
> >>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12352699
> >>
> >> [2]
> >>
> >>
> >>
> >>
> https://dist.apache.org/repos/dist/dev/flink/f

Re: flink documents Korean translation suggestions

2023-05-07 Thread Jing Ge
Hi Tan,

Thanks for raising the suggestion. Proposals are always welcome. Flink is
very well documented which turns out the translation effort is big. Afaiu,
there are limited contributors in the community who could review content in
Korean. I am wondering how you will continuously move forward and
keep contributing translated (old and new) content.

Best regards,
Jing

On Sat, May 6, 2023 at 10:34 AM Tan Kim  wrote:

> Hello, I'm Kim Tan, a software engineer from South Korea.
> I would like to contribute to the translation of flink documents into
> Korean, but it seems that only Chinese translation is being done at the
> moment, so I'm writing to make a proposal.
> I would appreciate your positive review.
>
> Regards,
> Tan Kim
>


Re: [VOTE] FLIP-306: Unified File Merging Mechanism for Checkpoints

2023-05-09 Thread Jing Ge
Hi Zakelly,

I saw you sent at least 4 same emails for voting FLIP-306. I guess this one
should be the last one and the right one for us to vote right? BTW, based
on the sending time, 72 hours means to open the discussion until May 12th.

Best regards,
Jing

On Tue, May 9, 2023 at 8:24 PM Zakelly Lan  wrote:

> Hi everyone,
>
> Thanks for all the feedback for FLIP-306: Unified File Merging
> Mechanism for Checkpoints[1] on the discussion thread[2].
>
> I'd like to start a vote for it. The vote will be open for at least 72
> hours (until May 11th, 12:00AM GMT) unless there is an objection or an
> insufficient number of votes.
>
>
> Best,
> Zakelly
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-306%3A+Unified+File+Merging+Mechanism+for+Checkpoints
> [2] https://lists.apache.org/thread/56px3kvn3tlwpc7sl12kx6notfmk9g7q
>


Re: [NOTICE] Flink master branch now uses Maven 3.8.6

2023-05-13 Thread Jing Ge
Great news! We can finally get rid of additional setup to use maven 3.8.
Thanks @Chesnay for your effort!

Best regards,
Jing

On Sat, May 13, 2023 at 5:12 AM David Anderson 
wrote:

> Chesnay, thank you for all your hard work on this!
>
> David
>
> On Fri, May 12, 2023 at 4:03 PM Chesnay Schepler 
> wrote:
> >
> >
> >   What happened?
> >
> > I have just merged the last commits to properly support Maven 3.3+ on
> > the Flink master branch.
> >
> > mvnw and CI have been updated to use Maven 3.8.6.
> >
> >
> >   What does this mean for me?
> >
> >   * You can now use Maven versions beyond 3.2.5 (duh).
> >   o Most versions should work, but 3.8.6 was the most tested and is
> > thus recommended.
> >   o 3.8.*5* is known to *NOT* work.
> >   * Starting from 1.18.0 you need to use Maven 3.8.6 for releases.
> >   o This may change to a later version until the release of 1.18.0.
> >   o There have been too many issues with recent Maven releases to
> > make a range acceptable.
> >   * *All dependencies that are bundled by a module must be marked as
> > optional.*
> >   o *This is verified on CI
> > <
> https://github.com/apache/flink/blob/master/tools/ci/flink-ci-tools/src/main/java/org/apache/flink/tools/ci/optional/ShadeOptionalChecker.java
> >.*
> >   o *Background info can be found in the wiki
> >  >.*
> >
> >
> >   Can I continue using Maven 3.2.5?
> >
> > For now, yes, but support will eventually be removed.
> >
> >
> >   Does this affect users?
> >
> > No.
> >
> >
> > Please ping me if you run into any issues.
>


Re: [DISCUSS] Release Flink 1.17.1

2023-05-15 Thread Jing Ge
+1 for releasing 1.17.1

Best Regards,
Jing

On Thu, May 11, 2023 at 10:03 AM Martijn Visser 
wrote:

> +1, thanks for volunteering!
>
> On Thu, May 11, 2023 at 9:23 AM Xintong Song 
> wrote:
>
> > +1
> >
> > I'll help with the steps that require PMC privileges.
> >
> > Best,
> >
> > Xintong
> >
> >
> >
> > On Thu, May 11, 2023 at 3:12 PM Jingsong Li 
> > wrote:
> >
> > > +1 for releasing 1.17.1
> > >
> > > Best,
> > > Jingsong
> > >
> > > On Thu, May 11, 2023 at 1:29 PM Gyula Fóra 
> wrote:
> > > >
> > > > +1 for the release
> > > >
> > > > Gyula
> > > >
> > > > On Thu, 11 May 2023 at 05:35, Yun Tang  wrote:
> > > >
> > > > > +1 for release flink-1.17.1
> > > > >
> > > > > The blocker issue might cause silent incorrect data, it's better to
> > > have a
> > > > > fix release ASAP.
> > > > >
> > > > >
> > > > > Best
> > > > > Yun Tang
> > > > > 
> > > > > From: weijie guo 
> > > > > Sent: Thursday, May 11, 2023 11:08
> > > > > To: dev@flink.apache.org ;
> > tonysong...@gmail.com
> > > <
> > > > > tonysong...@gmail.com>
> > > > > Subject: [DISCUSS] Release Flink 1.17.1
> > > > >
> > > > > Hi all,
> > > > >
> > > > >
> > > > > I would like to discuss creating a new 1.17 patch release (1.17.1).
> > The
> > > > > last 1.17 release is nearly two months old, and since then, 66
> > tickets
> > > have
> > > > > been closed [1], of which 14 are blocker/critical [2].  Some of
> them
> > > are
> > > > > quite important, such as FLINK-31293 [3] and  FLINK-32027 [4].
> > > > >
> > > > >
> > > > > I am not aware of any unresolved blockers and there are no
> > in-progress
> > > > > tickets [5].
> > > > > Please let me know if there are any issues you'd like to be
> included
> > in
> > > > > this release but still not merged.
> > > > >
> > > > >
> > > > > If the community agrees to create this new patch release, I could
> > > > > volunteer as the release manager
> > > > >  and Xintong can help with actions that require a PMC role.
> > > > >
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Weijie
> > > > >
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > >
> >
> https://issues.apache.org/jira/browse/FLINK-32027?jql=project%20%3D%20FLINK%20AND%20fixVersion%20%3D%201.17.1%20%20and%20resolution%20%20!%3D%20%20Unresolved%20order%20by%20priority%20DESC
> > > > >
> > > > > [2]
> > > > >
> > > > >
> > >
> >
> https://issues.apache.org/jira/browse/FLINK-31273?jql=project%20%3D%20FLINK%20AND%20fixVersion%20%3D%201.17.1%20and%20resolution%20%20!%3D%20Unresolved%20%20and%20priority%20in%20(Blocker%2C%20Critical)%20ORDER%20by%20priority%20%20DESC
> > > > >
> > > > > [3] https://issues.apache.org/jira/browse/FLINK-31293
> > > > >
> > > > > [4] https://issues.apache.org/jira/browse/FLINK-32027
> > > > >
> > > > > [5]
> https://issues.apache.org/jira/projects/FLINK/versions/12352886
> > > > >
> > >
> >
>


Re: [DISCUSS] FLIP-229: Prometheus Sink Connector

2023-05-18 Thread Jing Ge
Hi Karthi,

Thanks for raising this proposal. It is a common use case to sink metric
"data" into downstream Prometheus. The description in the motivation
section is more or less misleading the discussion. I would suggest you
rephrase it, e.g. metrics (pre)processing via Flink is 

The current Flip does not contain too much information about how data will
be sinked into Prometheus. Would you like to elaborate on it? Some
architecture diagrams might help us better understand your thoughts. Just
out of curiosity, since you are focusing on building the Prometheus
connector, does it make sense to build the Prometheus Source too?
Short-term stored metrics could be consumed by the Prometheus Source and,
depending on requirements, might perform some processing like aggregation.

Best regards,
Jing

On Wed, May 17, 2023 at 6:13 PM Danny Cranmer 
wrote:

> Thanks for the FLIP.
>
> I agree that there is a real usecase for metrics sink vs metric reporter.
> The metric reporters in Flink cover metrics about the Flink job, and a sink
> is used when the metrics are the _data_.
>
> +1 on the FLIP ID, can you please fix that?
>
> With regard to this FLIP.
> 1/ The fact we are using the remote write feature is not covered beyond the
> code example. Can we add details on this to make it clear? Additionally
> would this support _any_ Prometheus server or do we need to enable remote
> endpoint feature on ther server?
> 2/ Are there any concerns around Prometheus versioning or is the API
> backwards compatible? Which versions of Prometheus will we be supporting?
> 3/ With regard to the "AmazonPrometheusRequestSigner" the example has
> static creds. Can we integrate with the AWS Util to support all credential
> providers, static and dynamic?
>
> Thanks,
> Danny
>
> On Wed, May 17, 2023 at 4:34 PM Teoh, Hong 
> wrote:
>
> > Thanks Karthi for creating the FLIP!
> >
> > Re: Martijn,
> >
> > As I understand it, the motivation for the Prometheus Sink is for users
> > who want to write metrics to a Prometheus remote_write endpoint as an
> > output of their job graph, so it would be good to treat it as a
> first-class
> > citizen and do it as part of Flink’s data flow. This way, we would
> benefit
> > from at least once guarantees, scaling, state management.
> >
> > For example, a user might want to read in logs, perform some aggregations
> > and publish it into a metrics store for visualisation. This might be a
> > great use-case for reducing the cardinality of metrics!
> >
> > I think you might be referring to the metrics of the Flink job itself
> > (e.g. CPU / memory metrics). For this use case, I would agree that we
> > should not use this sink but instead use our metric reporters.
> >
> > Regards,
> > Hong
> >
> >
> >
> > > On 16 May 2023, at 03:37, Lijie Wang  wrote:
> > >
> > > CAUTION: This email originated from outside of the organization. Do not
> > click links or open attachments unless you can confirm the sender and
> know
> > the content is safe.
> > >
> > >
> > >
> > > Hi Karthi,
> > >
> > > I think you are using a wrong FLIP id, the FLIP-229 has already be
> > used[1].
> > >
> > > [1]
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-229%3A+Introduces+Join+Hint+for+Flink+SQL+Batch+Job
> > >
> > > Best,
> > > Lijie
> > >
> > > Martijn Visser  于2023年5月16日周二 04:44写道:
> > >
> > >> Hi Karthi,
> > >>
> > >> Thanks for the FLIP and opening up the discussion. My main question
> is:
> > why
> > >> should we create a separate connector and not use and/or improve the
> > >> existing integrations with Prometheus? I would like to understand more
> > so
> > >> that it can be added to the motivation of the FLIP.
> > >>
> > >> Best regards,
> > >>
> > >> Martijn
> > >>
> > >> On Mon, May 15, 2023 at 6:03 PM Karthi Thyagarajan <
> > kar...@karthitect.com>
> > >> wrote:
> > >>
> > >>> Hello all,
> > >>>
> > >>> We would like to start a discussion thread on FLIP-229: Prometheus
> Sink
> > >>> Connector [1] where we propose to provide a sink connector for
> > Prometheus
> > >>> [2] based on the Async Sink [3]. Looking forward to comments and
> > >> feedback.
> > >>> Thank you.
> > >>>
> > >>> [1]
> > >>>
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-229%3A+Prometheus+Sink+Connector
> > >>> [2] https://prometheus.io/
> > >>> [3]
> > >>>
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-171%3A+Async+Sink
> > >>>
> > >>
> >
> >
>


Re: [DISCUSS] Release Flink 1.17.1

2023-05-18 Thread Jing Ge
Hi Weijie,

just wondering how the process is going. Will the RC1 be released today?
Thanks for driving this!

Best regards,
Jing

On Wed, May 17, 2023 at 5:02 AM weijie guo 
wrote:

> Hi Kevin,
>
> If everything goes smoothly, I will release the RC1 version of 1.17.1 this
> Thursday or Friday.
>
> Best regards,
>
> Weijie
>
>
> Kevin Lam  于2023年5月16日周二 22:53写道:
>
> > Hi! Thanks for doing this release. I'm looking forward to some of the bug
> > fixes, is there a date set for the release of 1.17.1?
> >
> > On Mon, May 15, 2023 at 6:10 AM Lijie Wang 
> > wrote:
> >
> > > +1 for the release.
> > >
> > > Best,
> > > Lijie
> > >
> > > Jing Ge  于2023年5月15日周一 17:07写道:
> > >
> > > > +1 for releasing 1.17.1
> > > >
> > > > Best Regards,
> > > > Jing
> > > >
> > > > On Thu, May 11, 2023 at 10:03 AM Martijn Visser <
> > > martijnvis...@apache.org>
> > > > wrote:
> > > >
> > > > > +1, thanks for volunteering!
> > > > >
> > > > > On Thu, May 11, 2023 at 9:23 AM Xintong Song <
> tonysong...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > +1
> > > > > >
> > > > > > I'll help with the steps that require PMC privileges.
> > > > > >
> > > > > > Best,
> > > > > >
> > > > > > Xintong
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Thu, May 11, 2023 at 3:12 PM Jingsong Li <
> > jingsongl...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > +1 for releasing 1.17.1
> > > > > > >
> > > > > > > Best,
> > > > > > > Jingsong
> > > > > > >
> > > > > > > On Thu, May 11, 2023 at 1:29 PM Gyula Fóra <
> gyula.f...@gmail.com
> > >
> > > > > wrote:
> > > > > > > >
> > > > > > > > +1 for the release
> > > > > > > >
> > > > > > > > Gyula
> > > > > > > >
> > > > > > > > On Thu, 11 May 2023 at 05:35, Yun Tang 
> > wrote:
> > > > > > > >
> > > > > > > > > +1 for release flink-1.17.1
> > > > > > > > >
> > > > > > > > > The blocker issue might cause silent incorrect data, it's
> > > better
> > > > to
> > > > > > > have a
> > > > > > > > > fix release ASAP.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Best
> > > > > > > > > Yun Tang
> > > > > > > > > 
> > > > > > > > > From: weijie guo 
> > > > > > > > > Sent: Thursday, May 11, 2023 11:08
> > > > > > > > > To: dev@flink.apache.org ;
> > > > > > tonysong...@gmail.com
> > > > > > > <
> > > > > > > > > tonysong...@gmail.com>
> > > > > > > > > Subject: [DISCUSS] Release Flink 1.17.1
> > > > > > > > >
> > > > > > > > > Hi all,
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > I would like to discuss creating a new 1.17 patch release
> > > > (1.17.1).
> > > > > > The
> > > > > > > > > last 1.17 release is nearly two months old, and since then,
> > 66
> > > > > > tickets
> > > > > > > have
> > > > > > > > > been closed [1], of which 14 are blocker/critical [2].
> Some
> > of
> > > > > them
> > > > > > > are
> > > > > > > > > quite important, such as FLINK-31293 [3] and  FLINK-32027
> > [4].
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > I am not aware of any unresolved blockers and there are no
> > > > > > in-progress
> > > > > > > > > tickets [5].
> > > > > > > > > Please let me know if there are any issues you'd like to be
> > > > > included
> > > > > > in
> > > > > > > > > this release but still not merged.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > If the community agrees to create this new patch release, I
> > > could
> > > > > > > > > volunteer as the release manager
> > > > > > > > >  and Xintong can help with actions that require a PMC role.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > >
> > > > > > > > > Weijie
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > [1]
> > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/browse/FLINK-32027?jql=project%20%3D%20FLINK%20AND%20fixVersion%20%3D%201.17.1%20%20and%20resolution%20%20!%3D%20%20Unresolved%20order%20by%20priority%20DESC
> > > > > > > > >
> > > > > > > > > [2]
> > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/browse/FLINK-31273?jql=project%20%3D%20FLINK%20AND%20fixVersion%20%3D%201.17.1%20and%20resolution%20%20!%3D%20Unresolved%20%20and%20priority%20in%20(Blocker%2C%20Critical)%20ORDER%20by%20priority%20%20DESC
> > > > > > > > >
> > > > > > > > > [3] https://issues.apache.org/jira/browse/FLINK-31293
> > > > > > > > >
> > > > > > > > > [4] https://issues.apache.org/jira/browse/FLINK-32027
> > > > > > > > >
> > > > > > > > > [5]
> > > > > https://issues.apache.org/jira/projects/FLINK/versions/12352886
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: Call for help on the Web UI (In-Place Rescaling)

2023-05-19 Thread Jing Ge
Hi David,

Thanks for driving this. I'd like to second Emre, especially the second
suggestion. In practice, the parallelism number could be big. Users will be
frustrated, if they have to click hundred or even more times in order to
reach the wished number. Anyone who will take over the design task, please
count it in while designing the new UX/UI. Thanks!

Best regards,
Jing


On Fri, May 19, 2023 at 1:07 PM Kartoglu, Emre 
wrote:

> Hi David,
>
> This looks awesome. I am no expert on UI/UX, but still have opinions 😊
>
> I normally use the Overview tab for monitoring Flink jobs, and having
> control inputs there breaks my assumption that Overview is “read-only” and
> for “watching”.
> Having said that for “educational purposes” that might actually be a good
> place - I am imagining there would be a “educationalMode: true” flag or
> something somewhere to enable these buttons (and other educational bits in
> future).
>
> The “educational purpose” bit makes me a lot more relaxed about having
> those buttons as they are in the video!
>
> Couple other things to consider:
>
>
>   *   Confirming new parallelism before actually doing it, e.g. having a
> “Deploy/Commit/Save” button
>   *   Allow users to enter parallelism without having to
> increment/decrement one by one
>
> Thanks,
> Emre
>
> On 2023/05/19 06:49:08 David Morávek wrote:
> > Hi Everyone,
> >
> > In FLINK-31471, we've introduced new "in-place rescaling features" to the
> > Web UI that show up when the scheduler supports FLIP-291 REST endpoints.
> >
> > I expect this to be a significant feature for user education (they have
> an
> > easy way to try out how rescaling behaves, especially in combination
> with a
> > backpressure monitor) and marketing (read as "we can do fancy demos").
> >
> > However, the current sketch is not optimal due to my lack of UI/UX
> skills.
> >
> > Are there any volunteers that could and would like to help polish this?
> >
> > Here is a short demo [2] of what the current implementation can do.
> >
> > [1] https://issues.apache.org/jira/browse/FLINK-31471
> > [2] https://www.youtube.com/watch?v=B1NVDTazsZY
> >
> > Best,
> > D.
> >
>


Re: [DISCUSS] Release Flink 1.17.1

2023-05-19 Thread Jing Ge
Hi Weijie,

Thanks for your effort and sharing the status!

Best regards,
Jing

On Fri, May 19, 2023 at 11:50 AM weijie guo 
wrote:

> Hi Jing,
>
> Thank you for your attention!
>
> I have cut off the code for release-1.17.1-rc1 and started deploying it to
> the apache repository, while preparing the pull request for the blog post.
> The specific released time depends on the speed at which they are
> completed. VOTE thread will be launched no later than next Monday at the
> latest.
>
> Best regards,
>
> Weijie
>
>
> Jing Ge  于2023年5月19日周五 14:48写道:
>
> > Hi Weijie,
> >
> > just wondering how the process is going. Will the RC1 be released today?
> > Thanks for driving this!
> >
> > Best regards,
> > Jing
> >
> > On Wed, May 17, 2023 at 5:02 AM weijie guo 
> > wrote:
> >
> > > Hi Kevin,
> > >
> > > If everything goes smoothly, I will release the RC1 version of 1.17.1
> > this
> > > Thursday or Friday.
> > >
> > > Best regards,
> > >
> > > Weijie
> > >
> > >
> > > Kevin Lam  于2023年5月16日周二 22:53写道:
> > >
> > > > Hi! Thanks for doing this release. I'm looking forward to some of the
> > bug
> > > > fixes, is there a date set for the release of 1.17.1?
> > > >
> > > > On Mon, May 15, 2023 at 6:10 AM Lijie Wang  >
> > > > wrote:
> > > >
> > > > > +1 for the release.
> > > > >
> > > > > Best,
> > > > > Lijie
> > > > >
> > > > > Jing Ge  于2023年5月15日周一 17:07写道:
> > > > >
> > > > > > +1 for releasing 1.17.1
> > > > > >
> > > > > > Best Regards,
> > > > > > Jing
> > > > > >
> > > > > > On Thu, May 11, 2023 at 10:03 AM Martijn Visser <
> > > > > martijnvis...@apache.org>
> > > > > > wrote:
> > > > > >
> > > > > > > +1, thanks for volunteering!
> > > > > > >
> > > > > > > On Thu, May 11, 2023 at 9:23 AM Xintong Song <
> > > tonysong...@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > +1
> > > > > > > >
> > > > > > > > I'll help with the steps that require PMC privileges.
> > > > > > > >
> > > > > > > > Best,
> > > > > > > >
> > > > > > > > Xintong
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Thu, May 11, 2023 at 3:12 PM Jingsong Li <
> > > > jingsongl...@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > +1 for releasing 1.17.1
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Jingsong
> > > > > > > > >
> > > > > > > > > On Thu, May 11, 2023 at 1:29 PM Gyula Fóra <
> > > gyula.f...@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > +1 for the release
> > > > > > > > > >
> > > > > > > > > > Gyula
> > > > > > > > > >
> > > > > > > > > > On Thu, 11 May 2023 at 05:35, Yun Tang  >
> > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > +1 for release flink-1.17.1
> > > > > > > > > > >
> > > > > > > > > > > The blocker issue might cause silent incorrect data,
> it's
> > > > > better
> > > > > > to
> > > > > > > > > have a
> > > > > > > > > > > fix release ASAP.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Best
> > > > > > > > > > > Yun Tang
> > > > > > > > > > > 
> > > > > > > > > > > From: weijie guo 
> > > > > > > > &g

Re: [VOTE] Release 1.16.2, release candidate #1

2023-05-20 Thread Jing Ge
+1(non-binding)

- reviewed Jira release notes
- built from source
- apache repos contain all necessary files
- verified signatures
- verified hashes
- verified tag
- reviewed PR

Best regards,
Jing

On Sat, May 20, 2023 at 11:51 AM Yun Tang  wrote:

> +1 (non-binding)
>
>
>   *   Verified signatures
>   *   Reviewed the flink-web PR
>   *   Set up a standalone cluster from released binaries and check the git
> revision number.
>   *   Submit the statemachine example with RocksDB, and it works fine.
>
> Best,
> Yun Tang
> 
> From: Yuxin Tan 
> Sent: Friday, May 19, 2023 17:41
> To: dev@flink.apache.org 
> Subject: Re: [VOTE] Release 1.16.2, release candidate #1
>
> +1 (non-binding)
>
> - Verified signature
> - Verified hashes
> - Build form source with mac
> - Verify that the source archives do not contain any binaries
> - Run streaming and batch job in sql-client successfully.
>
> Thanks weijie for driving this release candidate.
>
> Best,
> Yuxin
>
>
> weijie guo  于2023年5月19日周五 16:19写道:
>
> > Hi everyone,
> >
> >
> > Please review and vote on the release candidate #1 for the version
> 1.16.2,
> >
> > as follows:
> >
> >
> > [ ] +1, Approve the release
> >
> > [ ] -1, Do not approve the release (please provide specific comments)
> >
> >
> > The complete staging area is available for your review, which includes:
> >
> > * JIRA release notes [1],
> >
> > * the official Apache source release and binary convenience releases to
> be
> >
> > deployed to dist.apache.org [2], which are signed with the key with
> >
> > fingerprint 8D56AE6E7082699A4870750EA4E8C4C05EE6861F [3],
> >
> > * all artifacts to be deployed to the Maven Central Repository [4],
> >
> > * source code tag "release-1.16.2-rc1" [5],
> >
> > * website pull request listing the new release and adding announcement
> blog
> >
> > post [6].
> >
> >
> > The vote will be open for at least 72 hours. It is adopted by majority
> >
> > approval, with at least 3 PMC affirmative votes.
> >
> >
> > Thanks,
> >
> > Release Manager
> >
> >
> > [1]
> >
> >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12352765
> >
> > [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.16.2-rc1/
> >
> > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> >
> > [4]
> > https://repository.apache.org/content/repositories/orgapacheflink-1634/
> >
> > [5] https://github.com/apache/flink/releases/tag/release-1.16.2-rc1
> >
> > [6] https://github.com/apache/flink-web/pull/649
> >
>


Re: Maven plugin to detect issues early on

2023-05-21 Thread Jing Ge
Hi Emre,

Thanks for your proposal. It looks very interesting! Please pay attention
that most connectors have been externalized. Will your proposed plug be
used for building Flink Connectors or Flink itself? Furthermore, it would
be great if you could elaborate features wrt best practices so that we
could understand how the plugin will help us.

Afaik, FLIP is recommended for improvement ideas that will change public
APIs. I am not sure if a new maven plugin belongs to it.

Best regards,
Jing

On Tue, May 16, 2023 at 11:29 AM Kartoglu, Emre 
wrote:

> Hello all,
>
> Myself and 2 colleagues developed a Maven plugin (no support for Gradle or
> other build tools yet) that we use internally to detect potential issues in
> Flink apps at compilation/packaging stage:
>
>
>   *   Known connector version incompatibilities – so far covering Kafka
> and Kinesis
>   *   Best practices e.g. setting operator IDs
>
> We’d like to make this open-source. Ideally with the Flink community’s
> support/mention of it on the Flink website, so more people use it.
>
> Going forward, I believe we have at least the following options:
>
>   *   Get community support: Create a FLIP to discuss where the plugin
> should live, what kind of problems it should detect etc.
>   *   We still open-source it but without the community support (if the
> community has objections to officially supporting it for instance).
>
> Just wanted to gauge the feeling/thoughts towards this tool from the
> community before going ahead.
>
> Thanks,
> Emre
>
>


Re: [VOTE] Release 1.17.1, release candidate #1

2023-05-22 Thread Jing Ge
+1(non-binding)

- Verified signatures
- Verified hashes
- Verified tag
- repos contain all necessary files
- built from source
- reviewed the PR

Best regards,
Jing

On Mon, May 22, 2023 at 1:32 PM Yun Tang  wrote:

> +1 (non-binding)
>
>
>   *   Verified signatures.
>   *   Start a local standalone cluster and checked the git revision number.
>   *   Submit stateful examples both in BATCH and STREAMING mode
> successfully.
>   *   Run a local sql-gateway.
>   *   Reviewed the release flink-web PR.
>
> Best
> Yun Tang
> 
> From: Xingbo Huang 
> Sent: Monday, May 22, 2023 19:10
> To: dev@flink.apache.org 
> Subject: Re: [VOTE] Release 1.17.1, release candidate #1
>
> +1 (binding)
>
> - verify signatures and checksums
> - verify python wheel package contents
> - pip install apache-flink-libraries and apache-flink wheel packages
> - run example flink/flink-python/pyflink/examples/table/basic_operations.py
> with Python 3.10
> - reviewed release blog post
>
> Best,
> Xingbo
>
> Yuxin Tan  于2023年5月22日周一 18:55写道:
>
> > +1 (non-binding)
> >
> > - verified the signatures
> > - verified the checksums
> > - built from source
> > - start a standalone cluster, run a simple batch and a streaming job
> > successfully
> > - review release announcement PR
> >
> > Best,
> > Yuxin
> >
> >
> > Xintong Song  于2023年5月22日周一 18:24写道:
> >
> > > +1 (binding)
> > >
> > > - verified signatures and checksums
> > > - built from source
> > > - tried example jobs with a standalone cluster, everything seems fine
> > > - review release announcement PR
> > >
> > > Best,
> > >
> > > Xintong
> > >
> > >
> > >
> > > On Mon, May 22, 2023 at 2:18 PM weijie guo 
> > > wrote:
> > >
> > > > Hi everyone,
> > > >
> > > >
> > > > Please review and vote on the release candidate #1 for the version
> > > 1.17.1,
> > > >
> > > > as follows:
> > > >
> > > > [ ] +1, Approve the release
> > > >
> > > > [ ] -1, Do not approve the release (please provide specific comments)
> > > >
> > > >
> > > > The complete staging area is available for your review, which
> includes:
> > > >
> > > >
> > > > * JIRA release notes [1],
> > > >
> > > > * the official Apache source release and binary convenience releases
> to
> > > be
> > > >
> > > > deployed to dist.apache.org [2], which are signed with the key with
> > > >
> > > > fingerprint 8D56AE6E7082699A4870750EA4E8C4C05EE6861F [3],
> > > >
> > > > * all artifacts to be deployed to the Maven Central Repository [4],
> > > >
> > > > * source code tag "release-1.17.1-rc1" [5],
> > > >
> > > > * website pull request listing the new release and adding
> announcement
> > > blog
> > > >
> > > > post [6].
> > > >
> > > >
> > > > The vote will be open for at least 72 hours. It is adopted by
> majority
> > > >
> > > > approval, with at least 3 PMC affirmative votes.
> > > >
> > > >
> > > > Thanks,
> > > >
> > > > Release Manager
> > > >
> > > >
> > > > [1]
> > > >
> > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12352886
> > > >
> > > > [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.17.1-rc1/
> > > >
> > > > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> > > >
> > > > [4]
> > > >
> > https://repository.apache.org/content/repositories/orgapacheflink-1635/
> > > >
> > > > [5] https://github.com/apache/flink/releases/tag/release-1.17.1-rc1
> > > >
> > > > [6] https://github.com/apache/flink-web/pull/650
> > > >
> > >
> >
>


Re: Maven plugin to detect issues early on

2023-05-22 Thread Jing Ge
Hi Emre,

Thanks for clarifying it. Afaiac, it is a quite interesting proposal,
especially for Flink job developers who are heavily using the Datastream
API. Publishing the plugin in your Github would be a feasible way for the
first move. As I mentioned previously, in order to help the community
understand the plugin, you might want to describe all those attractive
features you mentioned in e.g. the readme.md with more technical details. I
personally was wondering how those connector compatibility rules will be
defined and maintained, given that almost all connectors have been
externalized.

Best regards,
Jing

On Mon, May 22, 2023 at 11:24 AM Kartoglu, Emre 
wrote:

> Hi Jing,
>
> The proposed plugin would be used by Flink application developers, when
> they are writing their Flink job. It would trigger during
> compilation/packaging and would look for known incompatibilities, bad
> practices, or bugs.
> For instance one cause of frustration for our customers is connector
> incompatibilities (specifically Kafka and Kinesis) with certain Flink
> versions. This plugin would be a quick way to update a list of known
> incompatibilities, bugs, bad practices, so customers get errors during
> compilation/packaging and not after they've deployed their Flink job.
>
> From what you're saying, the FLIP route might not be the best way to go.
> We might publish this plugin in our own GitHub namespace/group first, and
> then get community acknowledgement/support for it. I believe working with
> the Flink community on this is key as we'd need their support/opinion to do
> this the right way and reach more Flink users.
>
> Thanks
> Emre
>
> On 21/05/2023, 16:48, "Jing Ge"  j...@ververica.com.inva>LID> wrote:
>
>
> CAUTION: This email originated from outside of the organization. Do not
> click links or open attachments unless you can confirm the sender and know
> the content is safe.
>
>
>
>
>
>
> Hi Emre,
>
>
> Thanks for your proposal. It looks very interesting! Please pay attention
> that most connectors have been externalized. Will your proposed plug be
> used for building Flink Connectors or Flink itself? Furthermore, it would
> be great if you could elaborate features wrt best practices so that we
> could understand how the plugin will help us.
>
>
> Afaik, FLIP is recommended for improvement ideas that will change public
> APIs. I am not sure if a new maven plugin belongs to it.
>
>
> Best regards,
> Jing
>
>
> On Tue, May 16, 2023 at 11:29 AM Kartoglu, Emre  <mailto:kar...@amazon.co.uk.inva>lid>
> wrote:
>
>
> > Hello all,
> >
> > Myself and 2 colleagues developed a Maven plugin (no support for Gradle
> or
> > other build tools yet) that we use internally to detect potential issues
> in
> > Flink apps at compilation/packaging stage:
> >
> >
> > * Known connector version incompatibilities – so far covering Kafka
> > and Kinesis
> > * Best practices e.g. setting operator IDs
> >
> > We’d like to make this open-source. Ideally with the Flink community’s
> > support/mention of it on the Flink website, so more people use it.
> >
> > Going forward, I believe we have at least the following options:
> >
> > * Get community support: Create a FLIP to discuss where the plugin
> > should live, what kind of problems it should detect etc.
> > * We still open-source it but without the community support (if the
> > community has objections to officially supporting it for instance).
> >
> > Just wanted to gauge the feeling/thoughts towards this tool from the
> > community before going ahead.
> >
> > Thanks,
> > Emre
> >
> >
>
>
>
>


Re: Maven plugin to detect issues early on

2023-05-22 Thread Jing Ge
cc user ML to get more attention, since the plugin will be used by Flink
application developers.

Best regards,
Jing

On Mon, May 22, 2023 at 3:32 PM Jing Ge  wrote:

> Hi Emre,
>
> Thanks for clarifying it. Afaiac, it is a quite interesting proposal,
> especially for Flink job developers who are heavily using the Datastream
> API. Publishing the plugin in your Github would be a feasible way for the
> first move. As I mentioned previously, in order to help the community
> understand the plugin, you might want to describe all those attractive
> features you mentioned in e.g. the readme.md with more technical details. I
> personally was wondering how those connector compatibility rules will be
> defined and maintained, given that almost all connectors have been
> externalized.
>
> Best regards,
> Jing
>
> On Mon, May 22, 2023 at 11:24 AM Kartoglu, Emre
>  wrote:
>
>> Hi Jing,
>>
>> The proposed plugin would be used by Flink application developers, when
>> they are writing their Flink job. It would trigger during
>> compilation/packaging and would look for known incompatibilities, bad
>> practices, or bugs.
>> For instance one cause of frustration for our customers is connector
>> incompatibilities (specifically Kafka and Kinesis) with certain Flink
>> versions. This plugin would be a quick way to update a list of known
>> incompatibilities, bugs, bad practices, so customers get errors during
>> compilation/packaging and not after they've deployed their Flink job.
>>
>> From what you're saying, the FLIP route might not be the best way to go.
>> We might publish this plugin in our own GitHub namespace/group first, and
>> then get community acknowledgement/support for it. I believe working with
>> the Flink community on this is key as we'd need their support/opinion to do
>> this the right way and reach more Flink users.
>>
>> Thanks
>> Emre
>>
>> On 21/05/2023, 16:48, "Jing Ge" > j...@ververica.com.inva>LID> wrote:
>>
>>
>> CAUTION: This email originated from outside of the organization. Do not
>> click links or open attachments unless you can confirm the sender and know
>> the content is safe.
>>
>>
>>
>>
>>
>>
>> Hi Emre,
>>
>>
>> Thanks for your proposal. It looks very interesting! Please pay attention
>> that most connectors have been externalized. Will your proposed plug be
>> used for building Flink Connectors or Flink itself? Furthermore, it would
>> be great if you could elaborate features wrt best practices so that we
>> could understand how the plugin will help us.
>>
>>
>> Afaik, FLIP is recommended for improvement ideas that will change public
>> APIs. I am not sure if a new maven plugin belongs to it.
>>
>>
>> Best regards,
>> Jing
>>
>>
>> On Tue, May 16, 2023 at 11:29 AM Kartoglu, Emre > <mailto:kar...@amazon.co.uk.inva>lid>
>> wrote:
>>
>>
>> > Hello all,
>> >
>> > Myself and 2 colleagues developed a Maven plugin (no support for Gradle
>> or
>> > other build tools yet) that we use internally to detect potential
>> issues in
>> > Flink apps at compilation/packaging stage:
>> >
>> >
>> > * Known connector version incompatibilities – so far covering Kafka
>> > and Kinesis
>> > * Best practices e.g. setting operator IDs
>> >
>> > We’d like to make this open-source. Ideally with the Flink community’s
>> > support/mention of it on the Flink website, so more people use it.
>> >
>> > Going forward, I believe we have at least the following options:
>> >
>> > * Get community support: Create a FLIP to discuss where the plugin
>> > should live, what kind of problems it should detect etc.
>> > * We still open-source it but without the community support (if the
>> > community has objections to officially supporting it for instance).
>> >
>> > Just wanted to gauge the feeling/thoughts towards this tool from the
>> > community before going ahead.
>> >
>> > Thanks,
>> > Emre
>> >
>> >
>>
>>
>>
>>


Re: [DISCUSS] FLIP-309: Enable operators to trigger checkpoints dynamically

2023-05-23 Thread Jing Ge
Hi Yunfeng, Hi Dong

Thanks for the informative discussion! It is a rational requirement to set
different checkpoint intervals for different sources in a hybridSource. The
tiny downside of this proposal, at least for me, is that I have to
understand the upper-bound definition of the interval and the built-in rule
for Flink to choose the minimum value between it and the default interval
setting. However, afaiac, the intention of this built-in rule is to
minimize changes in Flink to support the request feature which is a very
thoughtful move. Thanks for taking care of it. +1 for the Proposal.

Another very rough idea was rising in my mind while I was reading the FLIP.
I didn't do a deep dive with related source code yet, so please correct me
if I am wrong. The use case shows that two different checkpoint intervals
should be set for bounded(historical) stream and unbounded(fresh real-time)
stream sources. It is a trade-off between throughput and latency, i.e.
bounded stream with large checkpoint interval for better throughput and
unbounded stream with small checkpoint interval for lower latency (in case
of failover). As we could see that the different interval setting depends
on the boundedness of streams. Since the Source API already has its own
boundedness flag[1], is it possible to define two interval configurations
and let Flink automatically set the related one to the source based on the
known boundedness? The interval for bounded stream could be like
execution.checkpoint.interval.bounded(naming could be reconsidered), and
the other one for unbounded stream, we could use the existing one
execution.checkpoint.interval by default, or introduce a new one like
execution.checkpoint.interval.unbounded. In this way, no API change is
required.

@Piotr
Just out of curiosity, do you know any real use cases where real-time data
is processed before the backlog? Semantically, the backlog contains
historical data that has to be processed before the real-time data is
allowed to be processed. Otherwise, up-to-date data will be overwritten by
out-of-date data which turns out to be unexpected results in real business
scenarios.


Best regards,
Jing

[1]
https://github.com/apache/flink/blob/fadde2a378aac4293676944dd513291919a481e3/flink-core/src/main/java/org/apache/flink/api/connector/source/Source.java#L41

On Tue, May 23, 2023 at 5:53 PM Dong Lin  wrote:

> Hi Piotr,
>
> Thanks for the comments. Let me try to understand your concerns and
> hopefully address the concerns.
>
> >> What would happen if there are two (or more) operator coordinators with
> conflicting desired checkpoint trigger behaviour
>
> With the proposed change, there won't exist any "*conflicting* desired
> checkpoint trigger" by definition. Both job-level config and the proposed
> API upperBoundCheckpointingInterval() means the upper-bound of the
> checkpointing interval. If there are different upper-bounds proposed by
> different source operators and the job-level config, Flink will try to
> periodically trigger checkpoints at the interval corresponding to the
> minimum of all these proposed upper-bounds.
>
> >> If one source is processing a backlog and the other is already
> processing real time data..
>
> Overall, I am not sure we always want to have a longer checkpointing
> interval. That really depends on the specific use-case and the job graph.
>
> The proposed API change mechanism for operators and users to specify
> different checkpoint intervals at different periods of the job. Users have
> the option to use the new API to get better performance in the use-case
> specified in the motivation section. I believe there can be use-case where
> the proposed API is not useful, in which case users can choose not to use
> the API without incurring any performance regression.
>
> >> it might be a bit confusing and not user friendly to have multiple
> places that can override the checkpointing behaviour in a different way
>
> Admittedly, adding more APIs always incur more complexity. But sometimes we
> have to incur this complexity to address new use-cases. Maybe we can see if
> there are more user-friendly way to address this use-case.
>
> >> already implemented and is simple from the perspective of Flink
>
> Do you mean that the HybridSource operator should invoke the rest API to
> trigger checkpoints? The downside of this approach is that it makes it hard
> for developers of source operators (e.g. MySQL CDC, HybridSource) to
> address the target use-case. AFAIK, there is no existing case where we
> require operator developers to use REST API to do their job.
>
> Can you help explain the benefit of using REST API over using the proposed
> API?
>
> Note that this approach also seems to have the same downside mentioned
> above: "multiple places that can override the checkpointing behaviour". I
> am not sure there can be a solution to address the target use-case without
> having multiple places that can affect the checkpointing behavior.
>
> >> check if `

Re: [RESULT][VOTE] Release 1.16.2, release candidate #1

2023-05-24 Thread Jing Ge
Thanks Weijie for your effort! Looking forward to it!

Best regards,
Jing

On Wed, May 24, 2023 at 11:08 AM weijie guo 
wrote:

> I'm happy to announce that we have unanimously approved this release.
>
>
>
> There are 8 approving votes, 3 of which are binding:
>
>
> * Yuxin Tan
>
> * Yun Tang
>
> * Jing Ge
>
> * Xintong Song(binding)
>
> * Xingbo Huang(binding)
>
> * Qingsheng Ren(binding)
>
> * Samrat Deb
>
> * Hang Ruan
>
>
> There are no disapproving votes.
>
>
> I'll work on the steps to finalize the release and will send out the
>
> announcement as soon as that has been completed.
>
>
> Thanks everyone!
>
>
> Best regards,
>
> Weijie
>


Re: Zero-Downtime Deployments with Flink Operator

2023-05-24 Thread Jing Ge
Hi Kevin,

Thanks for reaching out. This question is not new and has been asked
many times previously. I have the same feeling as Gyula. There are ways to
provide zero downtime for Flink jobs, but commonly, I doubt if it is really
necessary.

If there might be some special use cases that really need zero downtime
with the Flink app itself, we should also drill down into the zero downtime
requirement.  For me, there are two different categories of zero downtime.
The first one is zero downtime for app upgrades. Traditional blue/green or
canary deployment for online service might be put on the table. The other
one is HA service that guarantees the zero downtime for Flink apps, i.e.
there will be zero delay (no impact at all) for any Flink job failover.
Attention should be paid that the big difference between Flink app and
online services is that Flink apps are stateful data computation. We have
to take care of the state and data consistency and will end up with a much
complicated and expensive solution. Since people care about ROI, I would
take a look at the big picture and try to provide zero downtime at the
"online" side.

Some users are not aware that Flink is an engine for "offline" apps even if
Flink is focusing on the real-time stream processing. Users who are asking
for zero downtime and blue/green deployment have a strong "online" apps
mindset. It sounds to me like asking if an apple has to be tasted as
orange. Commonly, Flink apps, again as offline apps, are only part of the
whole service landscape. We should check our requirements more closely. In
most cases, zero downtime could be done with online services who consume
the result data processed by Flink apps. No zero downtime at Flink side is
required, if the delay created by Flink job's failover is smaller than the
business SLA. Flink 1.17 and 1.18 are working on it and will reduce the
delay significantly to seconds[1]. This should be good enough for most use
cases (team up with online services) to provide the requested zero
downtime.

Just my two cents and thanks for any different thoughts that could help me
know what I didn't know.

Best regards,
Jing

[1] https://www.ververica.com/blog/generic-log-based-incremental-checkpoint

On Wed, May 24, 2023 at 7:39 PM Gyula Fóra  wrote:

> Hey Kevin!
>
> I am not aware of anyone currently working on this for the Flink Operator.
>
> Here are my current thoughts on the topic:
>
>1. It's not impossible to build this into the operator but it would
>require some considerable changes to the logic, both in terms of
> resource
>mapping and observer logic, however...
>2. It's a very niche use-case and in most cases this is not required
>3. Even if we implement it there are a lot of caveats for making this
>generally useful outside of some very specialized use-cases
>4. In most cases this is actually not a good way to perform upgrades and
>depending on the application it may lead to incorrect results etc.
>5. This is possible to build on top of the current operator logic
>externally
>
> So at the moment I am slightly against the idea in general, but of course I
> can be convinced otherwise if there is a general requirement / interest in
> the community. In any case we should have confidence that this will
> actually provide production value to many use-cases and it would require a
> FLIP for sure.
>
> Cheers,
> Gyula
>
>
>
> On Wed, May 24, 2023 at 5:24 PM Kevin Lam 
> wrote:
>
> > Hi,
> >
> > Is there any interest or ongoing work around supporting zero-downtime
> > deployments with Flink using the Flink Operator?
> >
> > I saw that https://issues.apache.org/jira/browse/FLINK-24257 existed,
> but
> > it looks a little stale. I'm interested in learning more about the
> current
> > state of things.
> >
> > There is also some pre-existing work done by Lyft:
> > https://www.youtube.com/watch?v=Hyt3YrtKQAM
> >
>


Re: [RESULT] [VOTE] Release 1.17.1, release candidate #1

2023-05-25 Thread Jing Ge
Hi Weijie,

Thanks again for driving it. I was wondering if you are able to share the
estimated date when the 1.16.2 and 1.17.1 releases will be officially
announced after the voting is closed? Thanks!

Best regards,
Jing

On Thu, May 25, 2023 at 9:46 AM weijie guo 
wrote:

> I'm happy to announce that we have unanimously approved this release.
>
>
>
> There are 7 approving votes, 3 of which are binding:
>
>
> * Xintong Song(binding)
>
> * Yuxin Tan
>
> * Xingbo Huang(binding)
>
> * Yun Tang
>
> * Jing Ge
>
> * Qingsheng Ren(binding)
>
> * Benchao Li
>
>
> There are no disapproving votes.
>
>
> I'll work on the steps to finalize the release and will send out the
>
> announcement as soon as that has been completed.
>
>
> Thanks everyone!
>
>
> Best regards,
>
> Weijie
>


Re: [DISCUSS] FLIP-309: Enable operators to trigger checkpoints dynamically

2023-05-25 Thread Jing Ge
cover pretty well most of the cases, what do you
> think?
> >
>
> Thank you for all the comments and this idea. I like this idea. We actually
> thought about this idea before proposing this FLIP.
>
> In order to make this idea work, we need to come-up with a good algorithm
> that can dynamically change the checkpointing interval based on the
> "backlog signal", without causing regression w.r.t. failover time and data
> freshness. I find it hard to come up with this algorithm due to
> insufficient "backlog signal".
>
> For the use-case mentioned in the motivation section, the data in the
> source does not have event timestamps to help determine the amount of
> backlog. So the only source-of-truth for determining backlog is the amount
> of data buffered in operators. But the buffer size is typically chosen to
> be proportional to round-trip-time and throughput. Having a full buffer
> does not necessarily mean that the data is lagging behind. And increasing
> the checkpointing interval with insufficient "backlog signal" can have a
> negative impact on data freshness and failover time.
>
> In order to make this idea work, we would need to *provide* that the
> algorithm would not negatively hurt data freshness and failover time when
> it decides to increase checkpointing intervals. For now I cold not come up
> with such an algorithm.
>
> If this backpressured based behaviour is still not enough, I would still
> > say that we should provide plugable checkpoint triggering controllers
> that
> > would work based on metrics.
>
>
> I am not sure how to address the use-case mentioned in the motivation
> section, with the pluggable checkpoint trigger + metrics. Can you help
> provide the definition of these APIs and kindly explain how that works to
> address the mentioned use-case.
>
> In the mentioned use-case, users want to have two different checkpointing
> intervals at different phases of the HybridSource. We should provide an API
> for users to express the extra checkpointing interval in addition to the
> existing execution.checkpointing.interval. What would be the definition of
> that API with this alternative approach?
>
> Best,
> Dong
>
>
> >
> > Best,
> > Piotrek
> >
> > czw., 25 maj 2023 o 07:47 Dong Lin  napisał(a):
> >
> > > Hi Jing,
> > >
> > > Thanks for your comments!
> > >
> > > Regarding the idea of using the existing "boundedness" attribute of
> > > sources, that is indeed something that we might find intuitive
> > initially. I
> > > have thought about this idea, but could not find a good way to make it
> > > work. I will try to explain my thoughts and see if we can find a better
> > > solution.
> > >
> > > Here is my understanding of the idea mentioned above: provide a job
> level
> > > config execution.checkpoint.interval.bounded. Flink will use this as
> the
> > > checkpointing interval whenever there exists at least one running
> source
> > > which claims it is under the "bounded" stage.
> > >
> > > Note that we can not simply re-use the existing "boundedness" attribute
> > of
> > > source operators. The reason is that for sources such as MySQL CDC, its
> > > boundedness can be "continuous_unbounded" because it can run
> > continuously.
> > > But MySQL CDC has two phases internally, where the source needs to
> first
> > > read a snapshot (with bounded amount of data) and then read a binlog
> > (with
> > > unbounded amount of data).
> > >
> > > As a result, in order to support optimization for souces like MySQL
> CDC,
> > we
> > > need to expose an API for the source operator to declare whether it is
> > > running at a bounded or continuous_unbounded stage. *This introduces
> the
> > > need to define a new concept named "bounded stage".*
> > >
> > > Then, we will need to *introduce a new contract between source
> operators
> > > and the Flink runtime*, saying that if there is a source that claims it
> > is
> > > running at the bounded stage, then Flink will use the "
> > > execution.checkpoint.interval.bounded" as the checkpointing interval.
> > >
> > > Here are the the concerns I have with this approach:
> > >
> > > - The execution.checkpoint.interval.bounded is a top-level config,
> > meaning
> > > that every Flink user needs to read about its semantics. In comparison,
> > the
> > > proposed approach only requires users of specific sour

Re: [RESULT] [VOTE] Release 1.17.1, release candidate #1

2023-05-25 Thread Jing Ge
Hi Weijie,

Thank you so much for sharing it!

Best regards,
Jing

On Fri, May 26, 2023 at 8:00 AM weijie guo 
wrote:

> Hi Jing,
>
> The release process for 1.16.2 and 1.17.1 is nearing its end, but I am
> waiting for Docker Hub to approval the update of manifest and publish the
> new image. I estimate that this may take 1-2 days and then I will
> officially announce the release of the new version immediately
>
> Best regards,
>
> Weijie
>
>
>
> Jing Ge  于2023年5月25日周四 19:42写道:
>
> > Hi Weijie,
> >
> > Thanks again for driving it. I was wondering if you are able to share the
> > estimated date when the 1.16.2 and 1.17.1 releases will be officially
> > announced after the voting is closed? Thanks!
> >
> > Best regards,
> > Jing
> >
> > On Thu, May 25, 2023 at 9:46 AM weijie guo 
> > wrote:
> >
> > > I'm happy to announce that we have unanimously approved this release.
> > >
> > >
> > >
> > > There are 7 approving votes, 3 of which are binding:
> > >
> > >
> > > * Xintong Song(binding)
> > >
> > > * Yuxin Tan
> > >
> > > * Xingbo Huang(binding)
> > >
> > > * Yun Tang
> > >
> > > * Jing Ge
> > >
> > > * Qingsheng Ren(binding)
> > >
> > > * Benchao Li
> > >
> > >
> > > There are no disapproving votes.
> > >
> > >
> > > I'll work on the steps to finalize the release and will send out the
> > >
> > > announcement as soon as that has been completed.
> > >
> > >
> > > Thanks everyone!
> > >
> > >
> > > Best regards,
> > >
> > > Weijie
> > >
> >
>


Re: [ANNOUNCE] Apache Flink 1.17.1 released

2023-05-26 Thread Jing Ge
Hi Weijie,

That is earlier than I expected! Thank you so much for your effort!

Best regards,
Jing

On Fri, May 26, 2023 at 4:44 PM Martijn Visser 
wrote:

> Same here as with Flink 1.16.2, thank you Weijie and those who helped with
> testing!
>
> On Fri, May 26, 2023 at 1:08 PM weijie guo 
> wrote:
>
>>
>> The Apache Flink community is very happy to announce the release of Apache 
>> Flink 1.17.1, which is the first bugfix release for the Apache Flink 1.17 
>> series.
>>
>>
>>
>>
>> Apache Flink® is an open-source stream processing framework for distributed, 
>> high-performing, always-available, and accurate data streaming applications.
>>
>>
>>
>> The release is available for download at:
>>
>> https://flink.apache.org/downloads.html
>>
>>
>>
>>
>> Please check out the release blog post for an overview of the improvements 
>> for this bugfix release:
>>
>> https://flink.apache.org/news/2023/05/25/release-1.17.1.html
>>
>>
>>
>> The full release notes are available in Jira:
>>
>>
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12352886
>>
>>
>>
>>
>> We would like to thank all contributors of the Apache Flink community who 
>> made this release possible!
>>
>>
>>
>>
>> Feel free to reach out to the release managers (or respond to this thread) 
>> with feedback on the release process. Our goal is to constantly improve the 
>> release process. Feedback on what could be improved or things that didn't go 
>> so well are appreciated.
>>
>>
>>
>> Regards,
>>
>> Release Manager
>>
>


Re: [ANNOUNCE] Apache Flink 1.16.2 released

2023-05-26 Thread Jing Ge
Hi Weijie,

Thanks again for your effort. I was wondering if there were any obstacles
you had to overcome while releasing 1.16.2 and 1.17.1 that could lead us to
any improvement wrt the release process and management?

Best regards,
Jing

On Fri, May 26, 2023 at 4:41 PM Martijn Visser 
wrote:

> Thank you Weijie and those who helped with testing!
>
> On Fri, May 26, 2023 at 1:06 PM weijie guo 
> wrote:
>
> > The Apache Flink community is very happy to announce the release of
> > Apache Flink 1.16.2, which is the second bugfix release for the Apache
> > Flink 1.16 series.
> >
> >
> >
> > Apache Flink® is an open-source stream processing framework for
> > distributed, high-performing, always-available, and accurate data
> > streaming applications.
> >
> >
> >
> > The release is available for download at:
> >
> > https://flink.apache.org/downloads.html
> >
> >
> >
> > Please check out the release blog post for an overview of the
> > improvements for this bugfix release:
> >
> > https://flink.apache.org/news/2023/05/25/release-1.16.2.html
> >
> >
> >
> > The full release notes are available in Jira:
> >
> >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12352765
> >
> >
> >
> > We would like to thank all contributors of the Apache Flink community
> > who made this release possible!
> >
> >
> >
> > Feel free to reach out to the release managers (or respond to this
> > thread) with feedback on the release process. Our goal is to
> > constantly improve the release process. Feedback on what could be
> > improved or things that didn't go so well are appreciated.
> >
> >
> >
> > Regards,
> >
> > Release Manager
> >
>


Re: Questions on checkpointing mechanism for FLIP-27 Source API

2023-05-26 Thread Jing Ge
Hi Hong,

Great question! Afaik, it depends on the implementation. Speaking of the
"loopback" event sending to the SplitEnumerator, I guess you meant here [1]
(It might be good, if you could point out the right position in the source
code to help us understand the question better:)), which will end up with
calling the SplitEnumerator[2]. There is only one implementation of the
method handleSourceEvent(int subtaskId, SourceEvent sourceEvent) in
HybridSourceSplitEnumerator[3].

The only call that will send a operator event to the SplitEnumerator I
found in the current master branch is in the HybridSourceReader when the
reader reaches the end of the input of the current source[4]. Since the
call is in SourceReader#pollNext(ReaderOutput output), it should follow the
exactly once semantic mechanism defined by [5]. My understanding is that
the OperatorEvent 1 will belong to the epoch after the checkpoint in this
case.

Best regards,
Jing

[1]
https://github.com/apache/flink/blob/678370b18e1b6c4a23e5ce08f8efd05675a0cc17/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/operators/SourceOperator.java#L284
[2]
https://github.com/apache/flink/blob/678370b18e1b6c4a23e5ce08f8efd05675a0cc17/flink-core/src/main/java/org/apache/flink/api/connector/source/SplitEnumerator.java#L120
[3]
https://github.com/apache/flink/blob/678370b18e1b6c4a23e5ce08f8efd05675a0cc17/flink-connectors/flink-connector-base/src/main/java/org/apache/flink/connector/base/source/hybrid/HybridSourceSplitEnumerator.java#L195
[4]
https://github.com/apache/flink/blob/678370b18e1b6c4a23e5ce08f8efd05675a0cc17/flink-connectors/flink-connector-base/src/main/java/org/apache/flink/connector/base/source/hybrid/HybridSourceReader.java#LL95C13-L95C13
[5]
https://github.com/apache/flink/blob/678370b18e1b6c4a23e5ce08f8efd05675a0cc17/flink-runtime/src/main/java/org/apache/flink/runtime/operators/coordination/OperatorCoordinatorHolder.java#L66

On Thu, May 25, 2023 at 4:27 AM Hongshun Wang 
wrote:

> Hi Hong,
>
> The checkpoint is triggered by the timer executor of CheckpointCoordinator.
> It triggers the checkpoint in SourceCoordinator (which is passed to
> SplitEnumerator) and then in SourceOperator. The checkpoint event is put in
> SplitEnumerator's event loop to be executed. You can see the details here.
>
> Yours
> Hongshun
>
> On Wed, May 17, 2023 at 11:39 PM Teoh, Hong 
> wrote:
>
> > Hi all,
> >
> > I’m writing a new source based on the FLIP-27 Source API, and I had some
> > questions on the checkpointing mechanisms and associated guarantees.
> Would
> > appreciate if someone more familiar with the API would be able to provide
> > insights here!
> >
> > In FLIP-27 Source, we now have a SplitEnumerator (running on JM) and a
> > SourceReader (running on TM). However, the SourceReader can send events
> to
> > the SplitEnumerator. Given this, we have introduced a “loopback”
> > communication mechanism from TM to JM, and I wonder if/how we handle this
> > during checkpoints.
> >
> >
> > Example of how data might be lost:
> > 1. Checkpoint 123 triggered
> > 2. SplitEnumerator takes checkpoint of state for checkpoint 123
> > 3. SourceReader sends OperatorEvent 1 and mutates state to reflect this
> > 4. SourceReader takes checkpoint of state for checkpoint 123
> > …
> > 5. Checkpoint 123 completes
> >
> > Let’s assume OperatorEvent 1 would mutate SplitEnumerator state once
> > processed, There is now inconsistent state between SourceReader state and
> > SplitEnumerator state. (SourceReader assumes OperatorEvent 1 is
> processed,
> > whereas SplitEnumerator has not processed OperatorEvent 1)
> >
> > Do we have any mechanisms for mitigating this issue? For example, does
> the
> > SplitEnumerator re-take the snapshot of state for a checkpoint if an
> > OperatorEvent is sent before the checkpoint is complete?
> >
> > Regards,
> > Hong
>


Re: [DISCUSS] FLIP-313 Add support of User Defined AsyncTableFunction

2023-05-26 Thread Jing Ge
Hi Aitozi,

Thanks for your proposal. I am not quite sure if I understood your thoughts
correctly. You described a special case implementation of the
AsyncTableFunction with on public API changes. Would you please elaborate
your purpose of writing a FLIP according to the FLIP documentation[1]?
Thanks!

[1]
https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals

Best regards,
Jing

On Wed, May 24, 2023 at 1:07 PM Aitozi  wrote:

> May I ask for some feedback  :D
>
> Thanks,
> Aitozi
>
> Aitozi  于2023年5月23日周二 19:14写道:
> >
> > Just catch an user case report from Giannis Polyzos for this usage:
> >
> > https://lists.apache.org/thread/qljwd40v5ntz6733cwcdr8s4z97b343b
> >
> > Aitozi  于2023年5月23日周二 17:45写道:
> > >
> > > Hi guys,
> > > I want to bring up a discussion about adding support of User
> > > Defined AsyncTableFunction in Flink.
> > > Currently, async table function are special functions for table source
> > > to perform
> > > async lookup. However, it's worth to support the user defined async
> > > table function.
> > > Because, in this way, the end SQL user can leverage it to perform the
> > > async operation
> > > which is useful to maximum the system throughput especially for IO
> > > bottleneck case.
> > >
> > > You can find some more detail in [1].
> > >
> > > Looking forward to feedback
> > >
> > >
> > > [1]:
> https://cwiki.apache.org/confluence/display/FLINK/%5BFLIP-313%5D+Add+support+of+User+Defined+AsyncTableFunction
> > >
> > > Thanks,
> > > Aitozi.
>


Re: [DISCUSS] FLIP-313 Add support of User Defined AsyncTableFunction

2023-05-28 Thread Jing Ge
Hi Aitozi,

Thanks for the clarification. The naming "Lookup" might suggest using it
for table look up. But conceptually what the eval() method will do is to
get a collection of results(Row, RowData) from the given keys. How it will
be done depends on the implementation, i.e. you can implement your own
Source[1][2]. The example in the FLIP should be able to be handled in this
way.

Do you mean to support the AyncTableFunction beyond the LookupTableSource?
It would be better if you could elaborate the proposed changes wrt the
CorrelatedCodeGenerator with more details. Thanks!

Best regards,
Jing

[1]
https://github.com/apache/flink/blob/678370b18e1b6c4a23e5ce08f8efd05675a0cc17/flink-table/flink-table-common/src/main/java/org/apache/flink/table/connector/source/LookupTableSource.java#L64
[2]
https://github.com/apache/flink/blob/678370b18e1b6c4a23e5ce08f8efd05675a0cc17/flink-table/flink-table-common/src/main/java/org/apache/flink/table/connector/source/AsyncTableFunctionProvider.java#L49

On Sat, May 27, 2023 at 9:48 AM Aitozi  wrote:

> Hi Jing,
> Thanks for your response. As stated in the FLIP, the purpose of this
> FLIP is meant to support
> user-defined async table function. As described in flink document [1]
>
> Async table functions are special functions for table sources that perform
> > a lookup.
> >
>
> So end user can not directly define and use async table function now. An
> user case is reported in [2]
>
> So, in conclusion, no new interface is introduced, but we extend the
> ability to support user-defined async table function.
>
> [1]:
>
> https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/dev/table/functions/udfs/
> [2]: https://lists.apache.org/thread/qljwd40v5ntz6733cwcdr8s4z97b343b
>
> Thanks.
> Aitozi.
>
>
> Jing Ge  于2023年5月27日周六 06:40写道:
>
> > Hi Aitozi,
> >
> > Thanks for your proposal. I am not quite sure if I understood your
> thoughts
> > correctly. You described a special case implementation of the
> > AsyncTableFunction with on public API changes. Would you please elaborate
> > your purpose of writing a FLIP according to the FLIP documentation[1]?
> > Thanks!
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals
> >
> > Best regards,
> > Jing
> >
> > On Wed, May 24, 2023 at 1:07 PM Aitozi  wrote:
> >
> > > May I ask for some feedback  :D
> > >
> > > Thanks,
> > > Aitozi
> > >
> > > Aitozi  于2023年5月23日周二 19:14写道:
> > > >
> > > > Just catch an user case report from Giannis Polyzos for this usage:
> > > >
> > > > https://lists.apache.org/thread/qljwd40v5ntz6733cwcdr8s4z97b343b
> > > >
> > > > Aitozi  于2023年5月23日周二 17:45写道:
> > > > >
> > > > > Hi guys,
> > > > > I want to bring up a discussion about adding support of User
> > > > > Defined AsyncTableFunction in Flink.
> > > > > Currently, async table function are special functions for table
> > source
> > > > > to perform
> > > > > async lookup. However, it's worth to support the user defined async
> > > > > table function.
> > > > > Because, in this way, the end SQL user can leverage it to perform
> the
> > > > > async operation
> > > > > which is useful to maximum the system throughput especially for IO
> > > > > bottleneck case.
> > > > >
> > > > > You can find some more detail in [1].
> > > > >
> > > > > Looking forward to feedback
> > > > >
> > > > >
> > > > > [1]:
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/%5BFLIP-313%5D+Add+support+of+User+Defined+AsyncTableFunction
> > > > >
> > > > > Thanks,
> > > > > Aitozi.
> > >
> >
>


Re: [VOTE] Release flink-connector-jdbc v3.1.1, release candidate #1

2023-05-28 Thread Jing Ge
+1 (non-binding)

- checked sign
- checked hash
- checked repos
- checked tag
- compiled from source
- check the web PR

Best regards,
Jing


On Sun, May 28, 2023 at 4:00 PM Benchao Li  wrote:

> Thanks Martijn,
>
> - checked signature/checksum [OK]
> - downloaded src, compiled from source [OK]
> - diffed src and tag, no binary files [OK]
> - gone through nexus staging area, looks good [OK]
> - run with flink 1.7.1 [OK]
>
> One thing I spotted is that the version in `docs/data/jdbc.yml` is still
> 3.1.0, I'm not sure whether this should be a blocker.
>
>
> Martijn Visser  于2023年5月25日周四 02:55写道:
>
> > Hi everyone,
> > Please review and vote on the release candidate #1 for the version 3.1.1,
> > as follows:
> > [ ] +1, Approve the release
> > [ ] -1, Do not approve the release (please provide specific comments)
> >
> >
> > The complete staging area is available for your review, which includes:
> > * JIRA release notes [1],
> > * the official Apache source release to be deployed to dist.apache.org
> > [2],
> > which are signed with the key with fingerprint
> > A5F3BCE4CBE993573EC5966A65321B8382B219AF [3],
> > * all artifacts to be deployed to the Maven Central Repository [4],
> > * source code tag v3.1.1-rc1 [5],
> > * website pull request listing the new release [6].
> >
> > The vote will be open for at least 72 hours. It is adopted by majority
> > approval, with at least 3 PMC affirmative votes.
> >
> > Thanks,
> > Release Manager
> >
> > [1]
> >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12353281
> > [2]
> >
> https://dist.apache.org/repos/dist/dev/flink/flink-connector-jdbc-3.1.1-rc1
> > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> > [4]
> > https://repository.apache.org/content/repositories/orgapacheflink-1636/
> > [5]
> https://github.com/apache/flink-connector-jdbc/releases/tag/v3.1.1-rc1
> > [6] https://github.com/apache/flink-web/pull/654
> >
>
>
> --
>
> Best,
> Benchao Li
>


Re: Kryo Upgrade: Request FLIP page create access

2023-05-28 Thread Jing Ge
Hi Qingsheng,

Could you grant Kurt rights for creating a new FLIP page? Thanks!

@Kurt

Thanks for reaching out. Please pay attention to the FLIP number you will
pick up and keep "Next FLIP number" on [1] up to date. Thanks!

Best regards,
Jing

[1]
https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals

On Sun, May 28, 2023 at 11:31 PM Kurt Ostfeld 
wrote:

> Chesnay Schepler asked me to create a FLIP for this pull request:
> https://github.com/apache/flink/pull/22660
>
> I created an account for the Flink Confluence site with username "kurto",
> but I don't have access to create pages, and therefore don't have access to
> create a FLIP. I see the FLIP docs say:
>
> > If you don't have the necessary permissions for creating a new page,
> please ask on the development mailing list.
>
>
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals
>
> Can I request this access please? Thank you :)


Re: [ANNOUNCE] Apache Flink 1.16.2 released

2023-05-29 Thread Jing Ge
Hi Weijie,

Thanks for your contribution and feedback! In case there are some reasons
not to allow us to upgrade them, we still can leverage virtualenv or pipenv
to create a dedicated environment for Flink release. WDYT?

cc Dian Fu

@Dian
I was wondering if you know the reason. Thanks!

Best regards,
Jing




On Mon, May 29, 2023 at 6:27 AM weijie guo 
wrote:

> Hi Jing,
>
> Thank you for caring about the releasing process. It has to be said that
> the entire process went smoothly. We have very comprehensive
> documentation[1] to guide my work, thanks to the contribution of previous
> release managers and the community.
>
> Regarding the obstacles, I actually only have one minor problem: We used an
> older twine(1.12.0) to deploy python artifacts to PyPI, and its compatible
> dependencies (such as urllib3) are also older. When I tried twine upload,
> the process couldn't work as expected as the version of urllib3 installed
> in my machine was relatively higher. In order to solve this, I had to
> proactively downgrade the version of some dependencies. I added a notice in
> the cwiki page[1] to prevent future release managers from encountering the
> same problem. It seems that this is a known issue(see comments in [2]),
> which has been resolved in the higher version of twine, I wonder if we can
> upgrade the version of twine? Does anyone remember the reason why we fixed
> a very old version(1.12.0)?
>
> Best regards,
>
> Weijie
>
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/Creating+a+Flink+Release
>
> [2] https://github.com/pypa/twine/issues/997
>
>
> Jing Ge  于2023年5月27日周六 00:15写道:
>
> > Hi Weijie,
> >
> > Thanks again for your effort. I was wondering if there were any obstacles
> > you had to overcome while releasing 1.16.2 and 1.17.1 that could lead us
> to
> > any improvement wrt the release process and management?
> >
> > Best regards,
> > Jing
> >
> > On Fri, May 26, 2023 at 4:41 PM Martijn Visser  >
> > wrote:
> >
> > > Thank you Weijie and those who helped with testing!
> > >
> > > On Fri, May 26, 2023 at 1:06 PM weijie guo 
> > > wrote:
> > >
> > > > The Apache Flink community is very happy to announce the release of
> > > > Apache Flink 1.16.2, which is the second bugfix release for the
> Apache
> > > > Flink 1.16 series.
> > > >
> > > >
> > > >
> > > > Apache Flink® is an open-source stream processing framework for
> > > > distributed, high-performing, always-available, and accurate data
> > > > streaming applications.
> > > >
> > > >
> > > >
> > > > The release is available for download at:
> > > >
> > > > https://flink.apache.org/downloads.html
> > > >
> > > >
> > > >
> > > > Please check out the release blog post for an overview of the
> > > > improvements for this bugfix release:
> > > >
> > > > https://flink.apache.org/news/2023/05/25/release-1.16.2.html
> > > >
> > > >
> > > >
> > > > The full release notes are available in Jira:
> > > >
> > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12352765
> > > >
> > > >
> > > >
> > > > We would like to thank all contributors of the Apache Flink community
> > > > who made this release possible!
> > > >
> > > >
> > > >
> > > > Feel free to reach out to the release managers (or respond to this
> > > > thread) with feedback on the release process. Our goal is to
> > > > constantly improve the release process. Feedback on what could be
> > > > improved or things that didn't go so well are appreciated.
> > > >
> > > >
> > > >
> > > > Regards,
> > > >
> > > > Release Manager
> > > >
> > >
> >
>


Re: Kryo Upgrade: Request FLIP page create access

2023-05-29 Thread Jing Ge
Thanks Qingsheng for taking care of this!

Best regards,
Jing

On Mon, May 29, 2023 at 4:24 AM Qingsheng Ren  wrote:

> Hi Kurt,
>
> The permission has been granted. Feel free to reach out if you have any
> questions.
>
> Looking forward to your FLIP!
>
> Best,
> Qingsheng
>
> On Mon, May 29, 2023 at 5:31 AM Kurt Ostfeld  >
> wrote:
>
> > Chesnay Schepler asked me to create a FLIP for this pull request:
> > https://github.com/apache/flink/pull/22660
> >
> > I created an account for the Flink Confluence site with username "kurto",
> > but I don't have access to create pages, and therefore don't have access
> to
> > create a FLIP. I see the FLIP docs say:
> >
> > > If you don't have the necessary permissions for creating a new page,
> > please ask on the development mailing list.
> >
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals
> >
> > Can I request this access please? Thank you :)
>


Re: [DISCUSS] FLIP 295: Support persistence of Catalog configuration and asynchronous registration

2023-05-29 Thread Jing Ge
Hi Feng,

Thanks for your effort! +1 for the proposal.

One of the major changes is that current design will provide
Map catalogs as a snapshot instead of a cache, which means
once it has been initialized, any changes done by other sessions will not
affect it. Point 6 described follow-up options for further improvement.

Best regards,
Jing

On Mon, May 29, 2023 at 5:31 AM Feng Jin  wrote:

> Hi all, I would like to update you on the latest progress of the FLIP.
>
>
> Last week, Leonard Xu, HangRuan, Jing Ge, Shammon FY, ShengKai Fang and I
> had an offline discussion regarding the overall solution for Flink
> CatalogStore. We have reached a consensus and I have updated the final
> solution in FLIP.
>
> Next, let me briefly describe the entire design:
>
>1.
>
>Introduce CatalogDescriptor to store catalog configuration similar to
>TableDescriptor.
>2.
>
>The two key functions of CatalogStore - void storeCatalog(String
>catalogName, CatalogDescriptor) and CatalogDescriptor getCatalog(String)
>will both use CatalogDescriptor instead of Catalog instance. This way,
>CatalogStore will only be responsible for saving and retrieving catalog
>configurations without having to initialize catalogs.
>3.
>
>The default registerCatalog(String catalogName, Catalog catalog)
>function in CatalogManager will be marked as deprecated.
>4.
>
>A new function registerCatalog(String catalogName, CatalogDescriptor
>catalog) will be added to serve as the default registration function for
>catalogs in CatalogManager.
>5.
>
>Map catalogs in CataloManager will remain unchanged and
>save initialized catalogs.This means that deletion operations from one
>session won't synchronize with other sessions.
>6.
>
>To support multi-session synchronization scenarios for deletions later
>on we should make Mapcatalogs configurable.There may be
>three possible situations:
>
>a.Default caching of all initialized catalogs
>
>b.Introduction of LRU cache logic which can automatically clear
>long-unused catalogs.
>
>c.No caching of any instances; each call to getCatalog creates a new
>instance.
>
>
> This is the document for discussion:
>
> https://docs.google.com/document/d/1HRJNd4_id7i6cUxGnAybmYZIwl5g1SmZCOzGdUz-6lU/edit
>
> This is the final proposal document:
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-295%3A+Support+lazy+initialization+of+catalogs+and+persistence+of+catalog+configurations
>
>
> Thank you very much for your attention and suggestions on this FLIP.  A
> special thanks to Hang Ruan for his suggestions on the entire design and
> organizing offline discussions.
>
> If you have any further suggestions or feedback about this FLIP please feel
> free to share.
>
>
> Best,
>
> Feng
>
> On Sat, May 6, 2023 at 8:32 PM Jing Ge  wrote:
>
> > Hi Feng,
> >
> > Thanks for improving the FLIP. It looks good to me. We could still
> > reconsider in the future how to provide more common built-in cache
> > functionality in CatalogManager so that not every CatalogSotre
> > implementation has to take care of it.
> >
> > Best regards,
> > Jing
> >
> > On Thu, May 4, 2023 at 1:47 PM Feng Jin  wrote:
> >
> > > Hi Jing,
> > >
> > > Thanks for your reply.
> > >
> > > >  There might be more such issues. I would suggest you completely walk
> > > through the FLIP again and fix those issues
> > >
> > > I am very sorry for my carelessness and at the same time, I greatly
> > > appreciate your careful review. I have thoroughly checked the entire
> FLIP
> > > and made corrections to these issues.
> > >
> > >
> > > >  If I am not mistaken, with the  current FLIP design, CatalogManager
> > > could work without Optional  CatalogStore being configured.
> > >
> > > Yes, in the original design, CatalogStore was not necessary because
> > > CatalogManager used Map catalogs to store catalog
> > > instances.
> > > However, this caused inconsistency issues. Therefore, I modified this
> > part
> > > of the design and removed Map catalogs from
> > > CatalogManager.
> > >  At the same time, InMemoryCatalog will serve as the default
> CatalogStore
> > > to save catalogs in memory and replace the functionality of
> > > Mapcatalogs.
> > > The previous plan that kept Mapcatalogs has been moved
> to
> > > Rejected Alternatives.
> > >
> > >
> > >
> > > Best,
> > > Feng
> &

Re: [SUMMARY] Flink 1.18 Release Sync 05/30/2023

2023-05-30 Thread Jing Ge
Thanks Qingsheng for driving it!

@Devs
As you might already be aware of, there are many externalizations and new
releases of Flink connectors. Once a connector has been externalized
successfully, i.e. the related module has been removed in the Flink repo,
we will not set a priority higher than major to tasks related to those
connectors.

Best regards,
Jing

On Tue, May 30, 2023 at 11:48 AM Qingsheng Ren  wrote:

> Hi devs and users,
>
> I'd like to share some highlights from the release sync of 1.18 on May 30.
>
> 1. @developers please update the progress of your features on 1.18 release
> wiki page [1] ! That will help us a lot to have an overview of the entire
> release cycle.
>
> 2. We found a JIRA issue (FLINK-18356) [2] that doesn't have an assignee,
> which is a CI instability of the flink-table-planner module. It'll be nice
> if someone in the community could pick it up and make some investigations
> :-)
>
> There are 6 weeks before the feature freeze date (Jul 11). The next release
> sync will be on Jun 13, 2023. Welcome to join us [3]!
>
> [1] https://cwiki.apache.org/confluence/display/FLINK/1.18+Release
> [2] https://issues.apache.org/jira/browse/FLINK-18356
> [3] Zoom meeting:
> https://us04web.zoom.us/j/79158702091?pwd=8CXPqxMzbabWkma5b0qFXI1IcLbxBh.1
>
> Best regards,
> Jing, Konstantin, Sergey and Qingsheng
>


Re: [DISCUSS] FLIP-309: Enable operators to trigger checkpoints dynamically

2023-05-30 Thread Jing Ge
e could switch between fast and
> > slow checkpointing intervals based on the information if the job is
> backpressured or not. My thinking is as follows:
> >
> > As a user, I would like to have my regular fast checkpointing interval
> for low latency, but the moment my system is not keeping up, if the
> backpressure
> > builds up, or simply we have a huge backlog to reprocess, latency doesn't
> matter anymore. Only throughput matters. So I would like the checkpointing
> to slow down.
> >
> > I think this should cover pretty well most of the cases, what do you
> think? If this backpressured based behaviour is still not enough, I would
> still say
> > that we should provide plugable checkpoint triggering controllers that
> would work based on metrics.
>
> > change the checkpointing interval based on the "backlog signal",
>
> What's wrong with the job being backpressured? If job is backpressured, we
> don't care about individual records latency, only about increasing
> the throughput to get out of the backpressure situation ASAP.
>
> > In the mentioned use-case, users want to have two different checkpointing
> > intervals at different phases of the HybridSource. We should provide an
> API
> > for users to express the extra checkpointing interval in addition to the
> > existing execution.checkpointing.interval. What would be the definition
> of
> > that API with this alternative approach?
>
> I think my proposal with `BacklogDynamicCheckpointTrigger` or
> `BackpressureDetectingCheckpointTrigger` would solve your motivating use
> case
> just as well.
>
> 1. In the catch up phase (reading the bounded source):
>   a) if we are under backpressure (common case), system would fallback to
> the less frequent checkpointing interval
>   b) if there is no backpressure (I hope a rare case, there is a backlog,
> but the source is too slow), Flink cluster has spare resources to actually
> run more
>   frequent checkpointing interval. No harm should be done. But arguably
> using a less frequent checkpointing interval here should be more desirable.
>
> 2. In the continuous processing phase (unbounded source)
>   a) if we are under backpressure, as I mentioned above, no one cares about
> checkpointing interval and the frequency of committing records to the
>   output, as e2e latency is already high due to the backlog in the
> sources
>   b) if there is no backpressure, that's the only case where the user
> actually cares about the frequency of committing records to the output, we
> are
>   using the more frequent checkpointing interval.
>
> 1b) I think is mostly harmless, and I think could be solved with some extra
> effort
> 2a) and 2b) are not solved by your proposal
> 2a) and 2b) are applicable to any source, not just HybridSource, which is
> also not covered by your proposal.
>
> Best,
> Piotrek
>
>
> czw., 25 maj 2023 o 17:29 Jing Ge  napisał(a):
>
> > Hi Dong, Hi Piotr,
> >
> > Thanks for the clarification.
> >
> > @Dong
> >
> > According to the code examples in the FLIP, I thought we are focusing on
> > the HybridSource scenario. With the current HybridSource implementation,
> we
> > don't even need to know the boundedness of sources in the HybridSource,
> > since all sources except the last one must be bounded[1], i.e. only the
> > last source is unbounded. This makes it much easier to set different
> > intervals to sources with different boundedness.
> >
> > Boundedness in Flink is a top level concept. I think it should be ok to
> > introduce a top level config for the top level concept. I am not familiar
> > with MySQL CDC. For those specific cases, you are right, your proposal
> can
> > provide the feature with minimal changes, like I mentioned previously, it
> > is a thoughtful design.  +1
> >
> > @Piotr
> >
> > > For example join (windowed/temporal) of two tables backed by a hybrid
> > > source? I could easily see a scenario where one table with little data
> > > catches up much more quickly.
> >
> > I am confused. I thought we were talking about HybridSource which "solves
> > the problem of sequentially reading input from heterogeneous sources to
> > produce a single input stream."[2]
> > I could not find any join within a HybridSource. So, your might mean
> > something else the join example and it should be out of the scope, if I
> am
> > not mistaken.
> >
> > > About the (un)boundness of the input stream. I'm not sure if that
> should
> > > actually matter. Actually the same issue, with

Re: [DISCUSS] FLIP-308: Support Time Travel In Batch Mode

2023-05-31 Thread Jing Ge
Hi Feng,

Thanks for the proposal! Very interesting feature. Would you like to update
your thoughts described in your previous email about why SupportsTimeTravel
has been rejected into the FLIP? This will help readers understand the
context (in the future).

Since we always directly add overload methods into Catalog according to new
requirements, which makes the interface bloated. Just out of curiosity,
does it make sense to introduce some DSL design? Like
Catalog.getTable(tablePath).on(timeStamp),
Catalog.getTable(tablePath).current() for the most current version, and
more room for further extension like timestamp range, etc. I haven't read
all the source code yet and I'm not sure if it is possible. But a
design like this will keep the Catalog API lean and the API/DSL will be
self described and easier to use.

Best regards,
Jing


On Wed, May 31, 2023 at 12:08 PM Krzysztof Chmielewski <
krzysiek.chmielew...@gmail.com> wrote:

> Ok after second though I'm retracting my previous statement about Catalog
> changes you proposed.
> I do see a benefit for Delta connector actually with this change and see
> why this could be coupled with Catalog.
>
> Delta Connector SQL support, also ships a Delta Catalog implementation for
> Flink.
> For Delta Catalog, table schema information is fetched from underlying
> _delta_log and not stored in metastore. For time travel we actually had a
> problem, that if we would like to timetravel back to some old version,
> where schema was slightly different, then we would have a conflict since
> Catalog would return current schema and not how it was for version X.
>
> With your change, our Delta Catalog can actually fetch schema for version X
> and send it to DeltaTableFactory. Currency, Catalog can fetch only current
> version. What we would also need however is version (number/timestamp) for
> this table passed to DynamicTableFactory so we could properly set Delta
> standalone library.
>
> Regards,
> Krzysztof
>
> śr., 31 maj 2023 o 10:37 Krzysztof Chmielewski <
> krzysiek.chmielew...@gmail.com> napisał(a):
>
> > Hi,
> > happy to see such a feature.
> > Small note from my end regarding Catalog changes.
> >
> > TL;DR
> > I don't think it is necessary to delegate this feature to the catalog. I
> > think that since "timetravel" is per job/query property, its should not
> be
> > coupled with the Catalog or table definition. In my opinion this is
> > something that DynamicTableFactory only has to know about. I would rather
> > see this feature as it is - SQL syntax enhancement but delegate clearly
> to
> > DynamicTableFactory.
> >
> > I've implemented timetravel feature for Delta Connector  [1]  using
> > current Flink API.
> > Docs are pending code review, but you can find them here [2] and examples
> > are available here [3]
> >
> > The timetravel feature that I've implemented is based on Flink Query
> > hints.
> > "SELECT * FROM sourceTable /*+ OPTIONS('versionAsOf' = '1') */"
> >
> > The " versionAsOf" (we also have 'timestampAsOf') parameter is handled
> not
> > by Catalog but by DyntamicTableFactory implementation for Delta
> connector.
> > The value of this property is passed to Delta standalone lib API that
> > returns table view for given version.
> >
> > I'm not sure how/if proposed change could benefit Delta connector
> > implementation for this feature.
> >
> > Thanks,
> > Krzysztof
> >
> > [1]
> >
> https://github.com/delta-io/connectors/tree/flink_table_catalog_feature_branch/flink
> > [2] https://github.com/kristoffSC/connectors/tree/FlinkSQL_PR_15-docs
> > [3]
> >
> https://github.com/delta-io/connectors/tree/flink_table_catalog_feature_branch/examples/flink-example/src/main/java/org/example/sql
> >
> > śr., 31 maj 2023 o 06:03 liu ron  napisał(a):
> >
> >> Hi, Feng
> >>
> >> Thanks for driving this FLIP, Time travel is very useful for Flink
> >> integrate with data lake system. I have one question why the
> >> implementation
> >> of TimeTravel is delegated to Catalog? Assuming that we use Flink to
> query
> >> Hudi table with the time travel syntax, but we don't use the
> HudiCatalog,
> >> instead, we register the hudi table to InMemoryCatalog,  can we support
> >> time travel for Hudi table in this case?
> >> In contrast, I think time travel should bind to connector instead of
> >> Catalog, so the rejected alternative should be considered.
> >>
> >> Best,
> >> Ron
> >>
> >> yuxia  于2023年5月30日周二 09:40写道:
> >>
> >> > Hi, Feng.
> >> > Notice this FLIP only support batch mode for time travel.  Would it
> also
> >> > make sense to support stream mode to a read a snapshot of the table
> as a
> >> > bounded stream?
> >> >
> >> > Best regards,
> >> > Yuxia
> >> >
> >> > - 原始邮件 -
> >> > 发件人: "Benchao Li" 
> >> > 收件人: "dev" 
> >> > 发送时间: 星期一, 2023年 5 月 29日 下午 6:04:53
> >> > 主题: Re: [DISCUSS] FLIP-308: Support Time Travel In Batch Mode
> >> >
> >> > # Can Calcite support this syntax ` VERSION AS OF`  ?
> >> >
> >> > This also depends on whether this is defined in standar

Re: [VOTE] Release flink-connector-pulsar 3.0.1, release candidate #1

2023-05-31 Thread Jing Ge
+1(non-binding)

- verified sign
- verified hash
- checked repos
- checked tag. NIT: the tag link should be:
https://github.com/apache/flink-connector-pulsar/releases/tag/v3.0.1-rc1
- reviewed PR. NIT: left a comment.

Best regards,
Jing

On Wed, May 31, 2023 at 11:16 PM Neng Lu  wrote:

> +1
>
> I verified
>
> + the release now can communicate with Pulsar using OAuth2 auth plugin
> + build from source and run unit tests with JDK 17 on macOS M1Max
>
>
> On Wed, May 31, 2023 at 4:24 AM Zili Chen  wrote:
>
> > +1
> >
> > I verified
> >
> > + LICENSE and NOTICE present
> > + Checksum and GPG sign matches
> > + No unexpected binaries in the source release
> > + Build from source and run unit tests with JDK 17 on macOS M1
> >
> > On 2023/05/25 16:18:51 Leonard Xu wrote:
> > > Hey all,
> > >
> > > Please review and vote on the release candidate #1 for the version
> 3.0.1
> > of the
> > > Apache Flink Pulsar Connector as follows:
> > >
> > > [ ] +1, Approve the release
> > > [ ] -1, Do not approve the release (please provide specific comments)
> > >
> > > The complete staging area is available for your review, which includes:
> > > JIRA release notes [1],
> > > The official Apache source release to be deployed to dist.apache.org
> > [2], which are signed with the key with
> > fingerprint5B2F6608732389AEB67331F5B197E1F1108998AD [3],
> > > All artifacts to be deployed to the Maven Central Repository [4],
> > > Source code tag v3.0.1-rc1 [5],
> > > Website pull request listing the new release [6].
> > > The vote will be open for at least 72 hours. It is adopted by majority
> > approval, with at least 3 PMC affirmative votes.
> > >
> > >
> > > Best,
> > > Leonard
> > >
> > > [1]
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12352640
> > > [2]
> >
> https://dist.apache.org/repos/dist/dev/flink/flink-connector-pulsar-3.0.1-rc1/
> > > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> > > [4]
> > https://repository.apache.org/content/repositories/orgapacheflink-1641/
> > > [5] https://github.com/apache/flink-connector-pulsar/tree/v3.0.1-rc1
> > > [6] https://github.com/apache/flink-web/pull/655
> > >
> > >
> >
>


Re: [DISCUSS] FLIP-313 Add support of User Defined AsyncTableFunction

2023-06-01 Thread Jing Ge
Hi Aitozi,

Sorry for the late reply. Would you like to update the proposed changes
with more details into the FLIP too?
I got your point. It looks like a rational idea. However, since lookup has
its clear async call requirement, are there any real use cases that
need this change? This will help us understand the motivation. After all,
lateral join and temporal lookup join[1] are quite different.

Best regards,
Jing


[1]
https://github.com/apache/flink/blob/d90a72da2fd601ca4e2a46700e91ec5b348de2ad/flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/AsyncTableFunction.java#L54

On Wed, May 31, 2023 at 8:53 AM Aitozi  wrote:

> Hi Jing,
> What do you think about it? Can we move forward this feature?
>
> Thanks,
> Aitozi.
>
> Aitozi  于2023年5月29日周一 09:56写道:
>
> > Hi Jing,
> > > "Do you mean to support the AyncTableFunction beyond the
> > LookupTableSource?"
> > Yes, I mean to support the AyncTableFunction beyond the
> LookupTableSource.
> >
> > The "AsyncTableFunction" is the function with ability to be executed
> async
> > (with AsyncWaitOperator).
> > The async lookup join is a one of usage of this. So, we don't have to
> bind
> > the AyncTableFunction with LookupTableSource.
> > If User-defined AsyncTableFunction is supported, user can directly use
> > lateral table syntax to perform async operation.
> >
> > > "It would be better if you could elaborate the proposed changes wrt the
> > CorrelatedCodeGenerator with more details"
> >
> > In the proposal, we use lateral table syntax to support the async table
> > function. So the planner will also treat this statement to a
> > CommonExecCorrelate node. So the runtime code should be generated in
> > CorrelatedCodeGenerator.
> > In CorrelatedCodeGenerator, we will know the TableFunction's Kind of
> > `FunctionKind.Table` or `FunctionKind.ASYNC_TABLE`
> > For  `FunctionKind.ASYNC_TABLE` we can generate a AsyncWaitOperator to
> > execute the async table function.
> >
> >
> > Thanks,
> > Aitozi.
> >
> >
> > Jing Ge  于2023年5月29日周一 03:22写道:
> >
> >> Hi Aitozi,
> >>
> >> Thanks for the clarification. The naming "Lookup" might suggest using it
> >> for table look up. But conceptually what the eval() method will do is to
> >> get a collection of results(Row, RowData) from the given keys. How it
> will
> >> be done depends on the implementation, i.e. you can implement your own
> >> Source[1][2]. The example in the FLIP should be able to be handled in
> this
> >> way.
> >>
> >> Do you mean to support the AyncTableFunction beyond the
> LookupTableSource?
> >> It would be better if you could elaborate the proposed changes wrt the
> >> CorrelatedCodeGenerator with more details. Thanks!
> >>
> >> Best regards,
> >> Jing
> >>
> >> [1]
> >>
> >>
> https://github.com/apache/flink/blob/678370b18e1b6c4a23e5ce08f8efd05675a0cc17/flink-table/flink-table-common/src/main/java/org/apache/flink/table/connector/source/LookupTableSource.java#L64
> >> [2]
> >>
> >>
> https://github.com/apache/flink/blob/678370b18e1b6c4a23e5ce08f8efd05675a0cc17/flink-table/flink-table-common/src/main/java/org/apache/flink/table/connector/source/AsyncTableFunctionProvider.java#L49
> >>
> >> On Sat, May 27, 2023 at 9:48 AM Aitozi  wrote:
> >>
> >> > Hi Jing,
> >> > Thanks for your response. As stated in the FLIP, the purpose of
> this
> >> > FLIP is meant to support
> >> > user-defined async table function. As described in flink document [1]
> >> >
> >> > Async table functions are special functions for table sources that
> >> perform
> >> > > a lookup.
> >> > >
> >> >
> >> > So end user can not directly define and use async table function now.
> An
> >> > user case is reported in [2]
> >> >
> >> > So, in conclusion, no new interface is introduced, but we extend the
> >> > ability to support user-defined async table function.
> >> >
> >> > [1]:
> >> >
> >> >
> >>
> https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/dev/table/functions/udfs/
> >> > [2]: https://lists.apache.org/thread/qljwd40v5ntz6733cwcdr8s4z97b343b
> >> >
> >> > Thanks.
> >> > Aitozi.
> >> >
> >> >
> >> > Jing Ge  于2023年5月27日周六 06:40写道:
> >> >
>

Re: [DISCUSS] Update Flink Roadmap

2023-06-01 Thread Jing Ge
Hi Jark,

Thanks for driving it! For point 2, since we are developing 1.18 now,
does it make sense to update the roadmap this time while we are releasing
1.18? This discussion thread will be focusing on the Flink 2.0 roadmap, as
you mentioned previously. WDYT?

Best regards,
Jing

On Thu, Jun 1, 2023 at 3:31 PM Jark Wu  wrote:

> Hi all,
>
> Martijn and I would like to initiate a discussion on the Flink roadmap,
> which should cover the project's long-term roadmap and the regular update
> mechanism.
>
> Xintong has already started a discussion about Flink 2.0 planning. One of
> the points raised in that discussion is that we should have a high-level
> discussion of the roadmap to present where the project is heading (which
> doesn't necessarily need to block the Flink 2.0 planning). Moreover, the
> roadmap on the Flink website [1] hasn't been updated for half a year, and
> the last update was for the feature radar for the 1.15 release. It has been
> 2 years since the community discussed Flink's overall roadmap.
>
> I would like to raise two topics for discussion:
>
> 1. The new roadmap. This should be an updated version of the current
> roadmap[1].
> 2. A mechanism to regularly discuss and update the roadmap.
>
> To make the first topic discussion more efficient, Martijn and I volunteer
> to summarize the ongoing big things of different components and present a
> roadmap draft to the community in the next few weeks. This should be a good
> starting point for a more detailed discussion.
>
> Regarding the regular update mechanism, there was a proposal in a thread
> [2] three years ago to make the release manager responsible for updating
> the roadmap. However, it appears that this was not documented as a release
> management task [3], and the roadmap update wasn't performed for releases
> 1.16 and 1.17.
>
> In my opinion, making release managers responsible for keeping the roadmap
> up to date is a good idea. Specifically, release managers of release X can
> kick off the roadmap update at the beginning of release X, which can be a
> joint task with collecting a feature list [4]. Additionally, release
> managers of release X-1 can help verify and remove the accomplished items
> from the roadmap and update the feature radar.
>
> What do you think? Do you have other ideas?
>
> Best,
> Jark & Martijn
>
> [1]: https://flink.apache.org/roadmap.html
> [2]: https://lists.apache.org/thread/o0l3cg6yphxwrww0k7215jgtw3yfoybv
> [3]:
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Release+Management
> [4]: https://cwiki.apache.org/confluence/display/FLINK/1.18+Release
>


Re: [DISCUSS] Update Flink Roadmap

2023-06-01 Thread Jing Ge
Hi Jark,

Fair enough. Let's do it like you suggested. Thanks!

Best regards,
Jing

On Thu, Jun 1, 2023 at 6:00 PM Jark Wu  wrote:

> Hi Jing,
>
> This thread is for discussing the roadmap for versions 1.18, 2.0, and even
> more.
> One of the outcomes of this discussion will be an updated version of the
> current roadmap.
> Let's work together on refining the roadmap in this thread.
>
> Best,
> Jark
>
> On Thu, 1 Jun 2023 at 23:25, Jing Ge  wrote:
>
> > Hi Jark,
> >
> > Thanks for driving it! For point 2, since we are developing 1.18 now,
> > does it make sense to update the roadmap this time while we are releasing
> > 1.18? This discussion thread will be focusing on the Flink 2.0 roadmap,
> as
> > you mentioned previously. WDYT?
> >
> > Best regards,
> > Jing
> >
> > On Thu, Jun 1, 2023 at 3:31 PM Jark Wu  wrote:
> >
> > > Hi all,
> > >
> > > Martijn and I would like to initiate a discussion on the Flink roadmap,
> > > which should cover the project's long-term roadmap and the regular
> update
> > > mechanism.
> > >
> > > Xintong has already started a discussion about Flink 2.0 planning. One
> of
> > > the points raised in that discussion is that we should have a
> high-level
> > > discussion of the roadmap to present where the project is heading
> (which
> > > doesn't necessarily need to block the Flink 2.0 planning). Moreover,
> the
> > > roadmap on the Flink website [1] hasn't been updated for half a year,
> and
> > > the last update was for the feature radar for the 1.15 release. It has
> > been
> > > 2 years since the community discussed Flink's overall roadmap.
> > >
> > > I would like to raise two topics for discussion:
> > >
> > > 1. The new roadmap. This should be an updated version of the current
> > > roadmap[1].
> > > 2. A mechanism to regularly discuss and update the roadmap.
> > >
> > > To make the first topic discussion more efficient, Martijn and I
> > volunteer
> > > to summarize the ongoing big things of different components and
> present a
> > > roadmap draft to the community in the next few weeks. This should be a
> > good
> > > starting point for a more detailed discussion.
> > >
> > > Regarding the regular update mechanism, there was a proposal in a
> thread
> > > [2] three years ago to make the release manager responsible for
> updating
> > > the roadmap. However, it appears that this was not documented as a
> > release
> > > management task [3], and the roadmap update wasn't performed for
> releases
> > > 1.16 and 1.17.
> > >
> > > In my opinion, making release managers responsible for keeping the
> > roadmap
> > > up to date is a good idea. Specifically, release managers of release X
> > can
> > > kick off the roadmap update at the beginning of release X, which can
> be a
> > > joint task with collecting a feature list [4]. Additionally, release
> > > managers of release X-1 can help verify and remove the accomplished
> items
> > > from the roadmap and update the feature radar.
> > >
> > > What do you think? Do you have other ideas?
> > >
> > > Best,
> > > Jark & Martijn
> > >
> > > [1]: https://flink.apache.org/roadmap.html
> > > [2]: https://lists.apache.org/thread/o0l3cg6yphxwrww0k7215jgtw3yfoybv
> > > [3]:
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Release+Management
> > > [4]: https://cwiki.apache.org/confluence/display/FLINK/1.18+Release
> > >
> >
>


Re: [DISCUSS] FLIP-313 Add support of User Defined AsyncTableFunction

2023-06-02 Thread Jing Ge
Hi Aitozi,

Thanks for the update. Just out of curiosity, what is the difference
between the RPC call or query you mentioned and the lookup in a very
general way? Since Lateral join is used in the FLIP. Is there any special
thought for that? Sorry for asking so many questions. The FLIP contains
limited information to understand the motivation.

Best regards,
Jing

On Fri, Jun 2, 2023 at 3:48 AM Aitozi  wrote:

> Hi Jing,
> I have updated the proposed changes to the FLIP. IMO, lookup has its
> clear
> async call requirement is due to its IO heavy operator. In our usage, sql
> users have
> logic to do some RPC call or query the third-party service which is also IO
> intensive.
> In these case, we'd like to leverage the async function to improve the
> throughput.
>
> Thanks,
> Aitozi.
>
> Jing Ge  于2023年6月1日周四 22:55写道:
>
> > Hi Aitozi,
> >
> > Sorry for the late reply. Would you like to update the proposed changes
> > with more details into the FLIP too?
> > I got your point. It looks like a rational idea. However, since lookup
> has
> > its clear async call requirement, are there any real use cases that
> > need this change? This will help us understand the motivation. After all,
> > lateral join and temporal lookup join[1] are quite different.
> >
> > Best regards,
> > Jing
> >
> >
> > [1]
> >
> >
> https://github.com/apache/flink/blob/d90a72da2fd601ca4e2a46700e91ec5b348de2ad/flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/AsyncTableFunction.java#L54
> >
> > On Wed, May 31, 2023 at 8:53 AM Aitozi  wrote:
> >
> > > Hi Jing,
> > > What do you think about it? Can we move forward this feature?
> > >
> > > Thanks,
> > > Aitozi.
> > >
> > > Aitozi  于2023年5月29日周一 09:56写道:
> > >
> > > > Hi Jing,
> > > > > "Do you mean to support the AyncTableFunction beyond the
> > > > LookupTableSource?"
> > > > Yes, I mean to support the AyncTableFunction beyond the
> > > LookupTableSource.
> > > >
> > > > The "AsyncTableFunction" is the function with ability to be executed
> > > async
> > > > (with AsyncWaitOperator).
> > > > The async lookup join is a one of usage of this. So, we don't have to
> > > bind
> > > > the AyncTableFunction with LookupTableSource.
> > > > If User-defined AsyncTableFunction is supported, user can directly
> use
> > > > lateral table syntax to perform async operation.
> > > >
> > > > > "It would be better if you could elaborate the proposed changes wrt
> > the
> > > > CorrelatedCodeGenerator with more details"
> > > >
> > > > In the proposal, we use lateral table syntax to support the async
> table
> > > > function. So the planner will also treat this statement to a
> > > > CommonExecCorrelate node. So the runtime code should be generated in
> > > > CorrelatedCodeGenerator.
> > > > In CorrelatedCodeGenerator, we will know the TableFunction's Kind of
> > > > `FunctionKind.Table` or `FunctionKind.ASYNC_TABLE`
> > > > For  `FunctionKind.ASYNC_TABLE` we can generate a AsyncWaitOperator
> to
> > > > execute the async table function.
> > > >
> > > >
> > > > Thanks,
> > > > Aitozi.
> > > >
> > > >
> > > > Jing Ge  于2023年5月29日周一 03:22写道:
> > > >
> > > >> Hi Aitozi,
> > > >>
> > > >> Thanks for the clarification. The naming "Lookup" might suggest
> using
> > it
> > > >> for table look up. But conceptually what the eval() method will do
> is
> > to
> > > >> get a collection of results(Row, RowData) from the given keys. How
> it
> > > will
> > > >> be done depends on the implementation, i.e. you can implement your
> own
> > > >> Source[1][2]. The example in the FLIP should be able to be handled
> in
> > > this
> > > >> way.
> > > >>
> > > >> Do you mean to support the AyncTableFunction beyond the
> > > LookupTableSource?
> > > >> It would be better if you could elaborate the proposed changes wrt
> the
> > > >> CorrelatedCodeGenerator with more details. Thanks!
> > > >>
> > > >> Best regards,
> > > >> Jing
> > > >>
> > > >> [1]
> > > >>
> >

Re: [DISCUSS] FLIP-307: Flink connector Redshift

2023-06-02 Thread Jing Ge
Hi Samrat,

Excited to see your proposal. Supporting data warehouses is one of the
major tracks for Flink. Thanks for driving it! Happy to see that we reached
consensus to prioritize the Sink over Source in the previous discussion. Do
you already have any prototype? I'd like to join the reviews.

Just out of curiosity, speaking of JDBC mode, according to the FLIP, it
should be doable to directly use the jdbc connector with Redshift, if I am
not mistaken. Will the Redshift connector provide additional features
beyond the mediator/wrapper of the jdbc connector?

Best regards,
Jing

On Thu, Jun 1, 2023 at 8:22 PM Ahmed Hamdy  wrote:

> Hi Samrat
>
> Thanks for putting up this FLIP. I agree regarding the importance of the
> use case.
> please let me know If you need any collaboration regarding integration with
> AWS connectors credential providers or regarding FLIP-171 I would be more
> than happy to assist.
> I also like Leonard's proposal for starting with DataStreamSink and
> TableSink, It would be great to have some milestones delivered as soon as
> ready.
> best regards
> Ahmed Hamdy
>
>
> On Wed, 31 May 2023 at 11:15, Samrat Deb  wrote:
>
> > Hi Liu Ron,
> >
> > > 1. Regarding the  `read.mode` and `write.mode`, you say here provides
> two
> > modes, respectively, jdbc and `unload or copy`, What is the default value
> > for `read.mode` and `write.mode?
> >
> > I have made an effort to make the configuration options `read.mode` and
> > `write.mode` mandatory for the "flink-connector-redshift" according to
> > FLIP[1]. The rationale behind this decision is to empower users who are
> > familiar with their Redshift setup and have specific expectations for the
> > sink. By making these configurations mandatory, users can have more
> control
> > and flexibility in configuring the connector to meet their requirements.
> >
> > However, I am open to receiving feedback on whether it would be
> beneficial
> > to make the configuration options non-mandatory and set default values
> for
> > them. If you believe there are advantages to having default values or any
> > other suggestions, please share your thoughts. Your feedback is highly
> > appreciated.
> >
> > >  2. For Source, does it both support batch read and streaming read?
> >
> > Redshift currently does not provide native support for streaming reads,
> > although it does support streaming writes[2]. As part of the plan, I
> intend
> > to conduct a proof of concept and benchmarking to explore the
> possibilities
> > of implementing streaming reads using the Flink JDBC connector, as
> Redshift
> > is JDBC compatible.
> > However, it is important to note that, in the initial phase of
> > implementation, the focus will primarily be on supporting batch reads
> > rather than streaming reads. This approach will allow us to deliver a
> > robust and reliable solution for batch processing in phase 2 of the
> > implementation.
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-307%3A++Flink+Connector+Redshift
> > [2]
> >
> >
> https://docs.aws.amazon.com/redshift/latest/dg/materialized-view-streaming-ingestion.html
> >
> > Bests,
> > Samrat
> >
> > On Wed, May 31, 2023 at 8:03 AM liu ron  wrote:
> >
> > > Hi, Samrat
> > >
> > > Thanks for driving this FLIP. It looks like supporting
> > > flink-connector-redshift is very useful to Flink. I have two question:
> > > 1. Regarding the  `read.mode` and `write.mode`, you say here provides
> two
> > > modes, respectively, jdbc and `unload or copy`, What is the default
> value
> > > for `read.mode` and `write.mode?
> > > 2. For Source, does it both support batch read and streaming read?
> > >
> > >
> > > Best,
> > > Ron
> > >
> > > Samrat Deb  于2023年5月30日周二 17:15写道:
> > >
> > > > [1]
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-307%3A++Flink+Connector+Redshift
> > > >
> > > > [note] Missed the trailing link for previous mail
> > > >
> > > >
> > > >
> > > > On Tue, May 30, 2023 at 2:43 PM Samrat Deb 
> > > wrote:
> > > >
> > > > > Hi Leonard,
> > > > >
> > > > > > and I’m glad to help review the design as well as the code
> review.
> > > > > Thank you so much. It would be really great and helpful to bring
> > > > > flink-connector-redshift for flink users :) .
> > > > >
> > > > > I have divided the implementation in 3 phases in the `Scope`
> > > Section[1].
> > > > > 1st phase is to
> > > > >
> > > > >- Integrate with Flink Sink API (*FLIP-171*
> > > > ><
> > > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-171%3A+Async+Sink
> > > >
> > > > >)
> > > > >
> > > > >
> > > > > > About the implementation phases, How about prioritizing support
> for
> > > the
> > > > > Datastream Sink API and TableSink API in the first phase?
> > > > > I can completely resonate with you to prioritize support for
> > Datastream
> > > > > Sink API and TableSink API in the first phase.
> > > > > I will update the FLIP[1] as you have suggested.
> > > > >
> > > > > > It

Re: [DISCUSS] FLIP-307: Flink connector Redshift

2023-06-05 Thread Jing Ge
Hi Samrat,

Thanks for the feedback. I would suggest adding that information into the
FLIP.

+1 Looking forward to your PR :-)

Best regards,
Jing

On Sat, Jun 3, 2023 at 9:19 PM Samrat Deb  wrote:

> Hi Jing Ge,
>
> >>> Do you already have any prototype? I'd like to join the reviews.
> The prototype is in progress. I will raise the dedicated PR for review soon
> also notify in this thread as well .
>
> >>> Will the Redshift connector provide additional features
> beyond the mediator/wrapper of the jdbc connector?
>
> Here are the additional features that the Flink connector for AWS Redshift
> can provide on top of using JDBC:
>
> 1. Integration with AWS Redshift Workload Management (WLM): AWS Redshift
> allows you to configure WLM[1] to manage query prioritization and resource
> allocation. The Flink connector for Redshift will be agnostic to the
> configured WLM and utilize it for scaling in and out for the sink. This
> means that the connector can leverage the WLM capabilities of Redshift to
> optimize the execution of queries and allocate resources efficiently based
> on your defined workload priorities.
>
> 2. Abstraction of AWS Redshift Quotas and Limits: AWS Redshift imposes
> certain quotas and limits[2] on various aspects such as the number of
> clusters, concurrent connections, queries per second, etc. The Flink
> connector for Redshift will provide an abstraction layer for users,
> allowing them to work with Redshift without having to worry about these
> specific limits. The connector will handle the management of connections
> and queries within the defined quotas and limits, abstracting away the
> complexity and ensuring compliance with Redshift's restrictions.
>
> These features aim to simplify the integration of Flink with AWS Redshift,
> providing optimized resource utilization and transparent handling of
> Redshift-specific limitations.
>
> Bests,
> Samrat
>
> [1]
>
> https://docs.aws.amazon.com/redshift/latest/dg/cm-c-implementing-workload-management.html
> [2]
>
> https://docs.aws.amazon.com/redshift/latest/mgmt/amazon-redshift-limits.html
>
> On Sat, Jun 3, 2023 at 11:40 PM Samrat Deb  wrote:
>
> > Hi Ahmed,
> >
> > >>> please let me know If you need any collaboration regarding
> integration
> > with
> > AWS connectors credential providers or regarding FLIP-171 I would be more
> > than happy to assist.
> >
> > Sure, I will reach out incase of any hands required.
> >
> >
> >
> > On Fri, Jun 2, 2023 at 6:12 PM Jing Ge 
> wrote:
> >
> >> Hi Samrat,
> >>
> >> Excited to see your proposal. Supporting data warehouses is one of the
> >> major tracks for Flink. Thanks for driving it! Happy to see that we
> >> reached
> >> consensus to prioritize the Sink over Source in the previous discussion.
> >> Do
> >> you already have any prototype? I'd like to join the reviews.
> >>
> >> Just out of curiosity, speaking of JDBC mode, according to the FLIP, it
> >> should be doable to directly use the jdbc connector with Redshift, if I
> am
> >> not mistaken. Will the Redshift connector provide additional features
> >> beyond the mediator/wrapper of the jdbc connector?
> >>
> >> Best regards,
> >> Jing
> >>
> >> On Thu, Jun 1, 2023 at 8:22 PM Ahmed Hamdy 
> wrote:
> >>
> >> > Hi Samrat
> >> >
> >> > Thanks for putting up this FLIP. I agree regarding the importance of
> the
> >> > use case.
> >> > please let me know If you need any collaboration regarding integration
> >> with
> >> > AWS connectors credential providers or regarding FLIP-171 I would be
> >> more
> >> > than happy to assist.
> >> > I also like Leonard's proposal for starting with DataStreamSink and
> >> > TableSink, It would be great to have some milestones delivered as soon
> >> as
> >> > ready.
> >> > best regards
> >> > Ahmed Hamdy
> >> >
> >> >
> >> > On Wed, 31 May 2023 at 11:15, Samrat Deb 
> wrote:
> >> >
> >> > > Hi Liu Ron,
> >> > >
> >> > > > 1. Regarding the  `read.mode` and `write.mode`, you say here
> >> provides
> >> > two
> >> > > modes, respectively, jdbc and `unload or copy`, What is the default
> >> value
> >> > > for `read.mode` and `write.mode?
> >> > >
> >> > > I have made an effort to make the configuration options `read.m

Re: [DISCUSS] FLIP-294: Support Customized Job Meta Data Listener

2023-06-05 Thread Jing Ge
Hi Shammon,

Thanks for driving it! It is a really interesting proposal. Looking forward
to the follow-up FLIP for the lineage feature, users will love it :-)

There are some inconsistencies in the content. In the very below example,
listener.onEvent(CatalogModificationEvent) is called, while in the
CatalogModificationListener interface definition, only
onEvent(CatalogModificationEvent, CatalogModificationContext) has been
defined.  I was wondering(NIT):

1. should there be another overloading method
onEvent(CatalogModificationEvent) alongside
onEvent(CatalogModificationEvent, CatalogModificationContext) ?
2. Since onEvent(CatalogModificationEvent) could be used, do we really need
CatalogModificationContext? API design example as reference: [1]

Best regards,
Jing


[1]
http://www.java2s.com/example/java-src/pkg/java/awt/event/actionlistener-add27.html

On Tue, Jun 6, 2023 at 7:43 AM Shammon FY  wrote:

> Hi devs:
>
> Thanks for all the feedback, and if there are no more comments, I will
> start a vote on FLIP-294 [1] later. Thanks again.
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-294%3A+Support+Customized+Job+Meta+Data+Listener
>
> Best,
> Shammon FY
>
> On Tue, Jun 6, 2023 at 1:40 PM Shammon FY  wrote:
>
> > Hi Martijn,
> >
> > Thanks for your attention, I will soon initiate a discussion about
> > FLIP-314.
> >
> > Best,
> > Shammon FY
> >
> >
> > On Fri, Jun 2, 2023 at 2:55 AM Martijn Visser 
> > wrote:
> >
> >> Hi Shammon,
> >>
> >> Just wanted to chip-in that I like the overall FLIP. Will be interesting
> >> to
> >> see the follow-up discussion on FLIP-314.
> >>
> >> Best regards,
> >>
> >> Martijn
> >>
> >> On Thu, Jun 1, 2023 at 5:45 AM yuxia 
> wrote:
> >>
> >> > Thanks for explanation. Make sense to me.
> >> >
> >> > Best regards,
> >> > Yuxia
> >> >
> >> > - 原始邮件 -
> >> > 发件人: "Shammon FY" 
> >> > 收件人: "dev" 
> >> > 发送时间: 星期四, 2023年 6 月 01日 上午 10:45:12
> >> > 主题: Re: [DISCUSS] FLIP-294: Support Customized Job Meta Data Listener
> >> >
> >> > Thanks yuxia, you're right and I'll add the new database to
> >> > AlterDatabaseEvent.
> >> >
> >> > I added `ignoreIfNotExists` for AlterDatabaseEvent because it is a
> >> > parameter in the `Catalog.alterDatabase` method. Although this value
> is
> >> > currently always false in `AlterDatabaseOperation`, I think it's
> better
> >> > to stay consistent with `Catalog.alterDatabase`. What do you think?
> >> >
> >> > Best,
> >> > Shammon FY
> >> >
> >> > On Thu, Jun 1, 2023 at 10:25 AM yuxia 
> >> wrote:
> >> >
> >> > > Hi, Shammon.
> >> > > I mean do we need to contain the new database after alter in
> >> > > AlterDatabaseEvent?  So that the listener can know what has been
> >> modified
> >> > > for the database. Or the listener don't need to care about the
> actual
> >> > > modification.
> >> > > Also, I'm wondering whether AlterDatabaseEven need to include
> >> > > ignoreIfNotExists method since alter database operation don't have
> >> such
> >> > > syntax like 'alter database if exists xxx'.
> >> > >
> >> > > Best regards,
> >> > > Yuxia
> >> > >
> >> > > - 原始邮件 -
> >> > > 发件人: "Shammon FY" 
> >> > > 收件人: "dev" 
> >> > > 发送时间: 星期三, 2023年 5 月 31日 下午 2:55:26
> >> > > 主题: Re: [DISCUSS] FLIP-294: Support Customized Job Meta Data
> Listener
> >> > >
> >> > > Hi yuxia
> >> > >
> >> > > Thanks for your input. The `AlterDatabaseEvent` extends
> >> > > `DatabaseModificationEvent` which has the original database.
> >> > >
> >> > > Best,
> >> > > Shammon FY
> >> > >
> >> > > On Wed, May 31, 2023 at 2:24 PM yuxia 
> >> > wrote:
> >> > >
> >> > > > Thanks Shammon for driving it.
> >> > > > The FLIP generally looks good to me. I only have one question.
> >> > > > WRT AlterDatabaseEvent, IIUC, it'll contain the origin database
> name
> >> > and
> >> > > > the new CatalogDatabase after modified. Is it enough only pass the
> >> > origin
> >> > > > database name? Will it be better to contain the origin
> >> CatalogDatabase
> >> > so
> >> > > > that listener have ways to know what changes?
> >> > > >
> >> > > > Best regards,
> >> > > > Yuxia
> >> > > >
> >> > > > - 原始邮件 -
> >> > > > 发件人: "ron9 liu" 
> >> > > > 收件人: "dev" 
> >> > > > 发送时间: 星期三, 2023年 5 月 31日 上午 11:36:04
> >> > > > 主题: Re: [DISCUSS] FLIP-294: Support Customized Job Meta Data
> >> Listener
> >> > > >
> >> > > > Hi, Shammon
> >> > > >
> >> > > > Thanks for driving this FLIP, It will enforce the Flink metadata
> >> > > capability
> >> > > > from the platform produce perspective. The overall design looks
> >> good to
> >> > > me,
> >> > > > I just have some small question:
> >> > > > 1. Regarding CatalogModificationListenerFactory#createListener
> >> method,
> >> > I
> >> > > > think it would be better to pass Context as its parameter instead
> of
> >> > two
> >> > > > specific Object. In this way, we can easily extend it in the
> future
> >> and
> >> > > > there will be no compatibility problems. Refer to
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> https://github.co

Re: [DISCUSS] FLIP-294: Support Customized Job Meta Data Listener

2023-06-06 Thread Jing Ge
Hi Shammon,

Thanks for the clarification. Just out of curiosity, if the context is not
part of the event, why should it be the input parameter of each onEvent
call?

Best regards,
Jing

On Tue, Jun 6, 2023 at 11:58 AM Leonard Xu  wrote:

> Thanks Shammon for the timely update, the updated FLIP looks good to me.
>
> Hope to see the vote thread and following FLIP-314 discussion thread.
>
> Best,
> Leonard
>
> > On Jun 6, 2023, at 5:04 PM, Shammon FY  wrote:
> >
> > Hi,
> >
> > Thanks for all the feedback.
> >
> > For @Jing Ge,
> > I forget to update the demo code in the FLIP, the method is
> > `onEvent(CatalogModificationEvent, CatalogModificationContext)` and there
> > is no `onEvent(CatalogModificationEvent)`. I have updated the code.
> Context
> > contains some additional information that is not part of an Event, but
> > needs to be used in the listener, so we separate it from the event.
> >
> > For @Panagiotis,
> > I think `ioExecutor` make sense to me and I have added it in
> > `ContextModificationContext`, thanks
> >
> > For @Leonard,
> > Thanks for your input.
> > 1. I have updated `CatalogModificationContext` as an interface, as well
> as
> > Context in CatalogModificationListenerFactory
> > 2. Configuration sounds good to me, I have updated the method name and
> > getConfiguration in Context
> >
> > For @David,
> > Yes, you're right. The listener will only be used on the client side and
> > won't introduce a new code path for running per-job/per-session jobs. The
> > listener will be created in `TableEnvironment` and `SqlGateway` which
> can a
> > `CatalogManager` with the listener.
> >
> >
> > Best,
> > Shammon FY
> >
> >
> > On Tue, Jun 6, 2023 at 3:33 PM David Morávek 
> > wrote:
> >
> >> Hi,
> >>
> >> Thanks for the FLIP! Data lineage is an important problem to tackle.
> >>
> >> Can you please expand on how this is planned to be wired into the
> >> JobManager? As I understand, the listeners will be configured globally
> (per
> >> cluster), so this won't introduce a new code path for running per-job /
> >> per-session user code. Is that correct?
> >>
> >> Best,
> >> D
> >>
> >> On Tue, Jun 6, 2023 at 9:17 AM Leonard Xu  wrote:
> >>
> >>> Thanks Shammon for driving this FLIP forward, I’ve several comments
> about
> >>> the updated FLIP.
> >>>
> >>> 1. CatalogModificationContext is introduced as a class instead of an
> >>> interface, is it a typo?
> >>>
> >>> 2. The FLIP defined multiple  Map config();  methods in
> >>> some Context classes, Could we use  Configuration
> >> getConfiguration();Class
> >>> org.apache.flink.configuration.Configuration is recommend as it’s
> public
> >>> API and offers more useful methods as well.
> >>>
> >>> 3. The Context of CatalogModificationListenerFactory should be an
> >>> interface too, and getUserClassLoder()
> >>> would be more aligned with flink’s naming style.
> >>>
> >>>
> >>> Best,
> >>> Leonard
> >>>
> >>>> On May 26, 2023, at 4:08 PM, Shammon FY  wrote:
> >>>>
> >>>> Hi devs,
> >>>>
> >>>> We would like to bring up a discussion about FLIP-294: Support
> >> Customized
> >>>> Job Meta Data Listener[1]. We have had several discussions with Jark
> >> Wu,
> >>>> Leonard Xu, Dong Lin, Qingsheng Ren and Poorvank about the functions
> >> and
> >>>> interfaces, and thanks for their valuable advice.
> >>>> The overall job and connector information is divided into metadata and
> >>>> lineage, this FLIP focuses on metadata and lineage will be discussed
> in
> >>>> another FLIP in the future. In this FLIP we want to add a customized
> >>>> listener in Flink to report catalog modifications to external metadata
> >>>> systems such as datahub[2] or atlas[3]. Users can view the specific
> >>>> information of connectors such as source and sink for Flink jobs in
> >> these
> >>>> systems, including fields, watermarks, partitions, etc.
> >>>>
> >>>> Looking forward to hearing from you, thanks.
> >>>>
> >>>>
> >>>> [1]
> >>>>
> >>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-294%3A+Support+Customized+Job+Meta+Data+Listener
> >>>> [2] https://datahub.io/
> >>>> [3] https://atlas.apache.org/#/
> >>>
> >>>
> >>
>
>


Re: [DISCUSS] FLIP-246: Multi Cluster Kafka Source

2023-06-06 Thread Jing Ge
Hi Mason,

It is a very practical feature that many users are keen to use. Thanks to
the previous discussion, the FLIP now looks informative. Thanks for your
proposal. One small suggestion is that the attached images are quite small
to read if we don't click and enlarge them. Besides that, It is difficult
to read the text on the current sequence diagram because it has a
transparent background. Would you like to replace it with a white
background?

Exactly-one is one of the key features of Kafka connector. I have the same
concern as Qingsheng. Since you have answered questions about it
previously, would you like to create an extra section in your FLIP to
explicitly describe scenarios when exactly-one is supported and when it is
not?

Best regards,
Jing

On Mon, Jun 5, 2023 at 11:41 PM Mason Chen  wrote:

> Hi all,
>
> I'm working on FLIP-246 again, for the Multi Cluster Kafka Source
> contribution. The document has been updated with some more context about
> how it can solve the Kafka topic removal scenario and a sequence diagram to
> illustrate how the components interact.
>
> Looking forward to any feedback!
>
> Best,
> Mason
>
> On Wed, Oct 12, 2022 at 11:12 PM Mason Chen 
> wrote:
>
> > Hi Ryan,
> >
> > Thanks for the additional context! Yes, the offset initializer would need
> > to take a cluster as a parameter and the MultiClusterKafkaSourceSplit can
> > be exposed in an initializer.
> >
> > Best,
> > Mason
> >
> > On Thu, Oct 6, 2022 at 11:00 AM Ryan van Huuksloot <
> > ryan.vanhuuksl...@shopify.com> wrote:
> >
> >> Hi Mason,
> >>
> >> Thanks for the clarification! In regards to the addition to the
> >> OffsetInitializer of this API - this would be an awesome addition and I
> >> think this entire FLIP would be a great addition to the Flink.
> >>
> >> To provide more context as to why we need particular offsets, we use
> >> Hybrid Source to currently backfill from buckets prior to reading from
> >> Kafka. We have a service that will tell us what offset has last been
> loaded
> >> into said bucket which we will use to initialize the KafkaSource
> >> OffsetsInitializer. We couldn't use a timestamp here and the offset
> would
> >> be different for each Cluster.
> >>
> >> In pseudocode, we'd want the ability to do something like this with
> >> HybridSources - if this is possible.
> >>
> >> ```scala
> >> val offsetsMetadata: Map[TopicPartition, Long] = // Get current offsets
> >> from OffsetReaderService
> >> val multiClusterArchiveSource: MultiBucketFileSource[T] = // Data is
> read
> >> from different buckets (multiple topics)
> >> val multiClusterKafkaSource: MultiClusterKafkaSource[T] =
> >> MultiClusterKafkaSource.builder()
> >>   .setKafkaMetadataService(new KafkaMetadataServiceImpl())
> >>   .setStreamIds(List.of("my-stream-1", "my-stream-2"))
> >>   .setGroupId("myConsumerGroup")
> >>
> >>
> .setDeserializer(KafkaRecordDeserializationSchema.valueOnly(StringDeserializer.class))
> >>   .setStartingOffsets(offsetsMetadata)
> >>   .setProperties(properties)
> >>   .build()
> >> val source =
> >>
> HybridSource.builder(multiClusterArchiveSource).addSource(multiClusterKafkaSource).build()
> >> ```
> >>
> >> Few notes:
> >> - TopicPartition won't work because the topic may be the same name as
> >> this is something that is supported IIRC
> >> - I chose to pass a map into starting offsets just for demonstrative
> >> purposes, I would be fine with whatever data structure would work best
> >>
> >> Ryan van Huuksloot
> >> Data Developer | Production Engineering | Streaming Capabilities
> >> [image: Shopify]
> >> <
> https://www.shopify.com/?utm_medium=salessignatures&utm_source=hs_email>
> >>
> >>
> >> On Mon, Oct 3, 2022 at 11:29 PM Mason Chen 
> >> wrote:
> >>
> >>> Hi Ryan,
> >>>
> >>> Just copying your message over to the email chain.
> >>>
> >>> Hi Mason,
>  First off, thanks for putting this FLIP together! Sorry for the delay.
>  Full disclosure Mason and I chatted a little bit at Flink Forward
> 2022 but
>  I have tried to capture the questions I had for him then.
>  I'll start the conversation with a few questions:
>  1. The concept of streamIds is not clear to me in the proposal and
>  could use some more information. If I understand correctly, they will
> be
>  used in the MetadataService to link KafkaClusters to ones you want to
> use?
>  If you assign stream ids using `setStreamIds`, how can you dynamically
>  increase the number of clusters you consume if the list of StreamIds
> is
>  static? I am basing this off of your example .setStreamIds(List.of(
>  "my-stream-1", "my-stream-2")) so I could be off base with my
>  assumption. If you don't mind clearing up the intention, that would be
>  great!
>  2. How would offsets work if you wanted to use this
>  MultiClusterKafkaSource with a file based backfill? In the case I am
>  thinking of, you have a bucket backed archive of Kafka data per
> cluster.
>  and you want to pick up from

Re: [DISCUSS] FLIP-315: Support Operator Fusion Codegen for Flink SQL

2023-06-07 Thread Jing Ge
Hi Ron,

Thanks for raising the proposal. It is a very attractive idea! Since the
FLIP is a relatively complex one which contains three papers and a design
doc. It deserves more time for the discussion to make sure everyone is on
the same page. I have a NIT question which will not block your voting
process. Previously, it took the community a lot of effort to make Flink
kinds of scala free. Since the code base of the table module is too big,
instead of porting to Java, all scala code has been hidden. Furthermore,
there are ongoing efforts to remove Scala code from Flink. As you can see,
the community tries to limit (i.e. get rid of) scala code as much as
possible. I was wondering if it is possible for you to implement the FLIP
with scala free code?

Best regards,
Jing

[1] https://flink.apache.org/2022/02/22/scala-free-in-one-fifteen/

On Wed, Jun 7, 2023 at 5:33 PM Aitozi  wrote:

> Hi Ron:
> Sorry for the late reply after the voting process. I just want to ask
>
> > Traverse the ExecNode DAG and create a FusionExecNode  for physical
> operators that can be fused together.
> which kind of operators can be fused together ? are the operators in an
> operator chain? Is this optimization aligned to spark's whole stage codegen
> ?
>
> > If any member operator does not support codegen, generate a
> Transformation DAG based on the topological relationship of member ExecNode
>  and jump to step 8.
> step8: Generate a FusionTransformation, setting the parallelism and managed
> memory for the fused operator.
>
> does the "support codegen" means fusion codegen? but why we generate a
> FusionTransformation when the member operator does not support codegen, IMO
> it should
> fallback to the current behavior.
>
> In the end, I share the same idea with Lincoln about performance benchmark.
> Currently flink community's flink-benchmark only covers like schedule,
> state, datastream operator's performance.
> A good benchmark harness for sql operator will benefit the sql optimizer
> topic and observation
>
> Thanks,
> Atiozi.
>
>
> liu ron  于2023年6月6日周二 19:30写道:
>
> > Hi dev
> >
> > Thanks for all the feedback, it seems that here are no more comments, I
> > will
> > start a vote on FLIP-315 [1] later. Thanks again.
> >
> > [1]:
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-315+Support+Operator+Fusion+Codegen+for+Flink+SQL
> >
> > Best,
> > Ron
> >
> > liu ron  于2023年6月5日周一 16:01写道:
> >
> > > Hi, Yun, Jinsong, Benchao
> > >
> > > Thanks for your valuable input about this FLIP.
> > >
> > > First of all, let me emphasize that from the technical implementation
> > > point of view, this design is feasible in both stream and batch
> > scenarios,
> > > so I consider both stream and batch mode in FLIP. In the stream
> scenario,
> > > for stateful operator, according to our business experience, basically
> > the
> > > bottleneck is on the state access, so the optimization effect of OFCG
> for
> > > the stream will not be particularly obvious, so we will not give
> priority
> > > to support it currently. On the contrary, in the batch scenario, where
> > CPU
> > > is the bottleneck, this optimization is gainful.
> > >
> > > Taking the above into account, we are able to support both stream and
> > > batch mode optimization in this design, but we will give priority to
> > > supporting batch operators. As benchao said, when we find a suitable
> > > streaming business scenario in the future, we can consider doing this
> > > optimization. Back to Yun issue, the design will break state
> > compatibility
> > > in stream mode as[1] and the version upgrade will not support this
> OFCG.
> > As
> > > mentioned earlier, we will not support this feature in stream mode in
> the
> > > short term.
> > >
> > > Also thanks to Benchao's suggestion, I will state the current goal of
> > that
> > > optimization in the FLIP, scoped to batch mode.
> > >
> > > Best,
> > > Ron
> > >
> > > liu ron  于2023年6月5日周一 15:04写道:
> > >
> > >> Hi, Lincoln
> > >>
> > >> Thanks for your appreciation of this design. Regarding your question:
> > >>
> > >> > do we consider adding a benchmark for the operators to intuitively
> > >> understand the improvement brought by each improvement?
> > >>
> > >> I think it makes sense to add a benchmark, Spark also has this
> benchmark
> > >> framework. But I think it is another story to introduce a benchmark
> > >> framework in Flink, we need to start a new discussion to this work.
> > >>
> > >> > for the implementation plan, mentioned in the FLIP that 1.18 will
> > >> support Calc, HashJoin and HashAgg, then what will be the next step?
> and
> > >> which operators do we ultimately expect to cover (all or specific
> ones)?
> > >>
> > >> Our ultimate goal is to support all operators in batch mode, but we
> > >> prioritize them according to their usage. Operators like Calc,
> HashJoin,
> > >> HashAgg, etc. are more commonly used, so we will support them first.
> > Later
> > >> we support the rest of the operators step

Re: [VOTE] FLIP-315: Support Operator Fusion Codegen for Flink SQL

2023-06-07 Thread Jing Ge
+1

Best Regards,
Jing

On Wed, Jun 7, 2023 at 10:52 AM weijie guo 
wrote:

> +1 (binding)
>
> Best regards,
>
> Weijie
>
>
> Jingsong Li  于2023年6月7日周三 15:59写道:
>
> > +1
> >
> > On Wed, Jun 7, 2023 at 3:03 PM Benchao Li  wrote:
> > >
> > > +1, binding
> > >
> > > Jark Wu  于2023年6月7日周三 14:44写道:
> > >
> > > > +1 (binding)
> > > >
> > > > Best,
> > > > Jark
> > > >
> > > > > 2023年6月7日 14:20,liu ron  写道:
> > > > >
> > > > > Hi everyone,
> > > > >
> > > > > Thanks for all the feedback about FLIP-315: Support Operator Fusion
> > > > Codegen
> > > > > for Flink SQL[1].
> > > > > [2] is the discussion thread.
> > > > >
> > > > > I'd like to start a vote for it. The vote will be open for at least
> > 72
> > > > > hours (until June 12th, 12:00AM GMT) unless there is an objection
> or
> > an
> > > > > insufficient number of votes.
> > > > >
> > > > > [1]:
> > > > >
> > > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-315+Support+Operator+Fusion+Codegen+for+Flink+SQL
> > > > > [2]:
> > https://lists.apache.org/thread/9cnqhsld4nzdr77s2fwf00o9cb2g9fmw
> > > > >
> > > > > Best,
> > > > > Ron
> > > >
> > > >
> > >
> > > --
> > >
> > > Best,
> > > Benchao Li
> >
>


Re: [DISCUSS] FLIP-294: Support Customized Job Meta Data Listener

2023-06-08 Thread Jing Ge
Hi Shammon,

If we take a look at the JDK Event design as a reference, we can even add
an Object into the event [1]. Back to the CatalogModificationEvent,
everything related to the event could be defined in the Event. If we want
to group some information into the Context, we could also consider adding
the CatalogModificationContext into the Event and make the onEvent() method
cleaner with only one input parameter CatalogModificationEvent, because the
interface CatalogModificationListener is the most often used interface for
users. Just my two cents.

Best regards,
Jing

[1]
http://www.java2s.com/example/java-src/pkg/java/util/eventobject-85298.html

On Thu, Jun 8, 2023 at 7:50 AM Shammon FY  wrote:

> Hi,
>
> To @Jing Ge
> > Thanks for the clarification. Just out of curiosity, if the context is
> not part of the event, why should it be the input parameter of each onEvent
> call?
>
> I think it's quite strange to put some information in an Event, such as a
> factory identifier for catalog, but they will be used by the listener.  I
> place it in the context class and I think it is more suitable than directly
> placing it in the event class.
>
> To @Mason
> > 1. I'm also curious about default implementations. Would atlas/datahub be
> supported by default?
>
> We won't do that and external systems such as atlas/datahub need to
> implement the listener themselves.
>
> > 2. The FLIP title is confusing to me, especially in distinguishing it
> from FLIP-314. Would a better FLIP title be "Support Catalog Metadata
> Listener" or something alike?
>
> Thanks, I think  "Support Catalog Modification Listener" will be
> more suitable, I'll update the title to it.
>
>
> Best,
> Shammon FY
>
>
> On Thu, Jun 8, 2023 at 12:25 PM Mason Chen  wrote:
>
> > Hi Shammon,
> >
> > FLIP generally looks good and I'm excited to see this feature.
> >
> > 1. I'm also curious about default implementations. Would atlas/datahub be
> > supported by default?
> > 2. The FLIP title is confusing to me, especially in distinguishing it
> from
> > FLIP-314. Would a better FLIP title be "Support Catalog Metadata
> Listener"
> > or something alike?
> >
> > Best,
> > Mason
> >
> > On Tue, Jun 6, 2023 at 3:33 AM Jing Ge 
> wrote:
> >
> > > Hi Shammon,
> > >
> > > Thanks for the clarification. Just out of curiosity, if the context is
> > not
> > > part of the event, why should it be the input parameter of each onEvent
> > > call?
> > >
> > > Best regards,
> > > Jing
> > >
> > > On Tue, Jun 6, 2023 at 11:58 AM Leonard Xu  wrote:
> > >
> > > > Thanks Shammon for the timely update, the updated FLIP looks good to
> > me.
> > > >
> > > > Hope to see the vote thread and following FLIP-314 discussion thread.
> > > >
> > > > Best,
> > > > Leonard
> > > >
> > > > > On Jun 6, 2023, at 5:04 PM, Shammon FY  wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > Thanks for all the feedback.
> > > > >
> > > > > For @Jing Ge,
> > > > > I forget to update the demo code in the FLIP, the method is
> > > > > `onEvent(CatalogModificationEvent, CatalogModificationContext)` and
> > > there
> > > > > is no `onEvent(CatalogModificationEvent)`. I have updated the code.
> > > > Context
> > > > > contains some additional information that is not part of an Event,
> > but
> > > > > needs to be used in the listener, so we separate it from the event.
> > > > >
> > > > > For @Panagiotis,
> > > > > I think `ioExecutor` make sense to me and I have added it in
> > > > > `ContextModificationContext`, thanks
> > > > >
> > > > > For @Leonard,
> > > > > Thanks for your input.
> > > > > 1. I have updated `CatalogModificationContext` as an interface, as
> > well
> > > > as
> > > > > Context in CatalogModificationListenerFactory
> > > > > 2. Configuration sounds good to me, I have updated the method name
> > and
> > > > > getConfiguration in Context
> > > > >
> > > > > For @David,
> > > > > Yes, you're right. The listener will only be used on the client
> side
> > > and
> > > > > won't introduce a new code path for running per-job/per-session
> jobs.
> > > The
>

Re: [DISCUSS] FLIP-315: Support Operator Fusion Codegen for Flink SQL

2023-06-08 Thread Jing Ge
Hi Ron,

Thanks for sharing the insight. Agree that it is not doable to rewrite the
entire planner module with Java. That was the reason why it has been hidden
instead of replaced. I thought, since the community decided to walk away
from scala, we should at least not add any more new scala code. According
to your suggestion, it is not the fact. I think the community should
reconsider how to handle scala, since the more features we are developing
in those areas the more scala code we will have, which makes it even harder
(impossible) to migrate to java. This is beyond the scope of this
discussion. I will start a new thread to address it.

Best regards,
Jing


On Thu, Jun 8, 2023 at 5:20 AM liu ron  wrote:

> Hi, Ging
>
> Thanks for your valuable input about scala free.
>
> Firstly, reply to your question, using java to implement codegen is
> possible,  but we need to utilize some tools. I think the first alternative
> is to update our jdk version to 13, which provides text block feature[1]
> makes string format easier, and improves the multiple-line String
> readability and writability. However, we don't update the JDK version to 13
> in the short term future. The second alternative is to use a third library
> such as Freemarker and StringTemplate, but this is not easy work, we need
> to introduce extra dependency in table planner, and makes our
> implementation more complicated.
>
> We use a lot of scala code in the planner module, one of the main purposes
> is that codegen is more friendly, and many of the operators are also
> implemented through codegen. In the foreseeable future, we do not have the
> time and manpower to remove the scala code from the planner module, so
> scala-free is unlikely. From the point of view of development friendliness
> and development cost, scala is currently a relatively better solution for
> codegen. Suppose we need to completely rewrite the planner module in java
> in the future, I think it is better to consider what tools are used to
> support codegen in a unified way at that time, and I can't give a suitable
> tool at the moment.
>
> In summary, I don't think it is feasible to implement my FLIP with
> scala-free at this time.
>
> [1]: https://openjdk.org/jeps/378
>
> Best,
> Ron
>
>
> liu ron  于2023年6月8日周四 10:51写道:
>
> > Hi, Atiozi
> >
> > Thanks for your feedback.
> >
> > > Traverse the ExecNode DAG and create a FusionExecNode  for physical
> > operators that can be fused together.
> > which kind of operators can be fused together ? are the operators in an
> > operator chain? Is this optimization aligned to spark's whole stage
> codegen
> > ?
> > In theory, all kinds of operators can be fused together, our final goal
> is
> > to support all operators in batch mode, OperatorChain is just one case.
> Due
> > to this work effort is relatively large, so we need to complete it step
> by
> > step. Our OFCG not only achieves the ability of spark's whole stage
> > codegen, but also do more better than them.
> >
> > > does the "support codegen" means fusion codegen? but why we generate a
> > FusionTransformation when the member operator does not support codegen,
> IMO
> > it should
> > fallback to the current behavior.
> >
> > yes, it means the fusion codegen. In FLIP, I propose two operator fusion
> > mechanisms, one is like OperatorChain for single input operator, another
> is
> > MultipleInput fusion. For the former, our design mechanism is to fuse all
> > operators together at the ExecNode layer only if they all support fusion
> > codegen, or else go over the default OperatorChain. For the latter, in
> > order not to break the existing MultipleInput optimization purpose, so
> when
> > there are member operators that do not support fusion codegen,  we will
> > fall back to the current behavior[1], which means that a
> > FusionTransformation is created. here FusionTransformation is just a
> > surrogate for MultipleInput case, it actually means
> > MultipleInputTransformation, which fuses multiple physical operators.
> > Sorry, the description in the flow is not very clear and caused your
> > confusion.
> >
> > > In the end, I share the same idea with Lincoln about performance
> > benchmark.
> > Currently flink community's flink-benchmark only covers like schedule,
> > state, datastream operator's performance.
> > A good benchmark harness for sql operator will benefit the sql optimizer
> > topic and observation
> >
> > For the performance benchmark, I agree with you. As I stated earlier, I
> > think this is a new scope of work, we sh

Call for Presentations: Flink Forward Seattle 2023

2023-06-08 Thread Jing Ge
Dear Flink developers & users,

We hope this email finds you well. We are excited to announce the Call for
Presentations for the upcoming Flink Forward Seattle 2023, the premier
event dedicated to Apache Flink and stream processing technologies. As a
prominent figure in the field, we invite you to submit your innovative
research, insightful experiences, and cutting-edge use cases for
consideration as a speaker at the conference.

Flink Forward Conference 2023 Details:
Date: November 6-7(training), November 8 (conference)
Location: Seattle United States

Flink Forward is a conference dedicated to the Apache Flink® community. In
2023 we shall have a full conference day following a 2-days long training
session. The conference gathers an international audience of CTOs/CIOs,
developers, data architects, data scientists, Apache Flink® core
committers, and the stream processing community, to share experiences,
exchange ideas and knowledge, and receive hands-on training sessions led by
Flink experts. We are seeking compelling presentations and
thought-provoking talks that cover a broad range of topics related to
Apache Flink, including but not limited to:

Flink architecture and internals
Flink performance optimization
Advanced Flink features and enhancements
Real-world use cases and success stories
Flink ecosystem and integrations
Stream processing at scale
Best practices for Flink application development

If you have an inspiring story, valuable insights, real-world application,
research breakthroughs, use case, best practice, or compelling vision of
the future for Flink, we encourage you to present it to a highly skilled
and enthusiastic community. We welcome submissions from both industry
professionals and academic researchers.

To submit your proposal, please visit the Flink Forward Conference website
at https://www.flink-forward.org/seattle-2023/call-for-presentations. The
submission form will require you to provide an abstract of your talk, along
with a brief biography and any supporting materials. The deadline for
submissions is July 12th 11:59 pm PDT.

We believe your contribution will greatly enrich the Flink Forward
Conference and provide invaluable insights to our attendees. This is an
excellent opportunity to connect with a diverse community of Flink
enthusiasts, network with industry experts, and gain recognition for your
expertise. We look forward to receiving your submission and welcoming you
as a speaker at the Flink Forward Conference.

Thank you for your time and consideration.

Best regards,

-- 

Jing Ge | Head of Engineering

j...@ververica.com

<https://www.ververica.com/>

Follow us @VervericaData

--

Join Flink Forward <https://flink-forward.org/> - The Apache Flink
Conference - Tickets on SALE now!
<https://eu.eventscloud.com/ereg/newreg.php?eventid=200259741&#>

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Chausseestrasse 20, 10115 Berlin, Germany

--

Ververica GmbH

Registered at Amtsgericht Charlottenburg: HRB 158244 B

Managing Directors: Alexander Walden, Karl Anton Wehner, Yip Park Tung
Jason, Jinwei (Kevin) Zhang


Re: [DISCUSS] FLIP-313 Add support of User Defined AsyncTableFunction

2023-06-08 Thread Jing Ge
 process as an async RPC call process.
> > > >
> > > > Let's see how we can perform an async RPC call with lateral join:
> > > >
> > > > (1) Implement an AsyncTableFunction with RPC call logic.
> > > > (2) Run query with
> > > >
> > > > Create function f1 as '...' ;
> > > >
> > > > SELECT o.order_id, o.total, c.country, c.zip FROM Orders  lateral
> table
> > > > (f1(order_id)) as T(...);
> > > >
> > > > As you can see, the lateral join version is more simple and intuitive
> > to
> > > > users. Users do not have to wrap a
> > > > LookupTableSource for the purpose of using async udtf.
> > > >
> > > > In the end, We can also see the user defined async table function is
> an
> > > > enhancement of the current lateral table join
> > > > which only supports sync lateral join now.
> > > >
> > > > Best,
> > > > Aitozi.
> > > >
> > > >
> > > > Jing Ge  于2023年6月2日周五 19:37写道:
> > > >
> > > >> Hi Aitozi,
> > > >>
> > > >> Thanks for the update. Just out of curiosity, what is the difference
> > > >> between the RPC call or query you mentioned and the lookup in a very
> > > >> general way? Since Lateral join is used in the FLIP. Is there any
> > > special
> > > >> thought for that? Sorry for asking so many questions. The FLIP
> > contains
> > > >> limited information to understand the motivation.
> > > >>
> > > >> Best regards,
> > > >> Jing
> > > >>
> > > >> On Fri, Jun 2, 2023 at 3:48 AM Aitozi  wrote:
> > > >>
> > > >> > Hi Jing,
> > > >> > I have updated the proposed changes to the FLIP. IMO, lookup
> has
> > > its
> > > >> > clear
> > > >> > async call requirement is due to its IO heavy operator. In our
> > usage,
> > > >> sql
> > > >> > users have
> > > >> > logic to do some RPC call or query the third-party service which
> is
> > > >> also IO
> > > >> > intensive.
> > > >> > In these case, we'd like to leverage the async function to improve
> > the
> > > >> > throughput.
> > > >> >
> > > >> > Thanks,
> > > >> > Aitozi.
> > > >> >
> > > >> > Jing Ge  于2023年6月1日周四 22:55写道:
> > > >> >
> > > >> > > Hi Aitozi,
> > > >> > >
> > > >> > > Sorry for the late reply. Would you like to update the proposed
> > > >> changes
> > > >> > > with more details into the FLIP too?
> > > >> > > I got your point. It looks like a rational idea. However, since
> > > lookup
> > > >> > has
> > > >> > > its clear async call requirement, are there any real use cases
> > that
> > > >> > > need this change? This will help us understand the motivation.
> > After
> > > >> all,
> > > >> > > lateral join and temporal lookup join[1] are quite different.
> > > >> > >
> > > >> > > Best regards,
> > > >> > > Jing
> > > >> > >
> > > >> > >
> > > >> > > [1]
> > > >> > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://github.com/apache/flink/blob/d90a72da2fd601ca4e2a46700e91ec5b348de2ad/flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/AsyncTableFunction.java#L54
> > > >> > >
> > > >> > > On Wed, May 31, 2023 at 8:53 AM Aitozi 
> > > wrote:
> > > >> > >
> > > >> > > > Hi Jing,
> > > >> > > > What do you think about it? Can we move forward this
> > feature?
> > > >> > > >
> > > >> > > > Thanks,
> > > >> > > > Aitozi.
> > > >> > > >
> > > >> > > > Aitozi  于2023年5月29日周一 09:56写道:
> > > >> > > >
> > > >> > > > > Hi Jing,
> > > >> > > > > > "Do you mean to support the AyncTableFunction beyon

Re: [DISCUSS] FLIP-313 Add support of User Defined AsyncTableFunction

2023-06-09 Thread Jing Ge
Hi Aitozi,

Thanks for the feedback. Looking forward to the performance tests.

Afaik, lookup returns one row for each key [1] [2]. Conceptually, the
lookup function is used to enrich column(s) from the dimension table. If,
for the given key, there will be more than one row, there will be no way to
know which row will be used to enrich the key.

[1]
https://github.com/apache/flink/blob/191ec6ca3943d7119f14837efe112e074d815c47/flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/LookupFunction.java#L49
[2]
https://github.com/apache/flink/blob/191ec6ca3943d7119f14837efe112e074d815c47/flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/TableFunction.java#L196

Best regards,
Jing

On Fri, Jun 9, 2023 at 5:18 AM Aitozi  wrote:

> Hi Jing
> Thanks for your good questions. I have updated the example to the FLIP.
>
> > Only one row for each lookup
> lookup can also return multi rows, based on the query result. [1]
>
> [1]:
>
> https://github.com/apache/flink/blob/191ec6ca3943d7119f14837efe112e074d815c47/flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/LookupFunction.java#L56
>
> > If we use async calls with lateral join, my gut feeling is
> that we might have many more async calls than lookup join. I am not really
> sure if we will be facing potential issues in this case or not.
>
> IMO, the work pattern is similar to the lookup function, for each row from
> the left table,
> it will evaluate the eval method once, so the async call numbers will not
> change.
> and the maximum calls in flight is limited by the Async operators buffer
> capacity
> which will be controlled by the option.
>
> BTW, for the naming of these option, I updated the FLIP about this you can
> refer to
> the section of "ConfigOption" and "Rejected Alternatives"
>
> In the end, for the performance evaluation, I'd like to do some tests and
> will update it to the FLIP doc
>
> Thanks,
> Aitozi.
>
>
> Jing Ge  于2023年6月9日周五 07:23写道:
>
> > Hi Aitozi,
> >
> > Thanks for the clarification. The code example looks interesting. I would
> > suggest adding them into the FLIP. The description with code examples
> will
> > help readers understand the motivation and how to use it. Afaiac, it is a
> > valid feature for Flink users.
> >
> > As we knew, lookup join is based on temporal join, i.e. FOR SYSTEM_TIME
> AS
> > OF which is also used in your code example. Temporal join performs the
> > lookup based on the processing time match. Only one row for each
> > lookup(afaiu, I need to check the source code to double confirm) will
> > return for further enrichment. One the other hand, lateral join will have
> > sub-queries correlated with every individual value of the reference table
> > from the preceding part of the query and each sub query will return
> > multiple rows. If we use async calls with lateral join, my gut feeling is
> > that we might have many more async calls than lookup join. I am not
> really
> > sure if we will be facing potential issues in this case or not. Possible
> > issues I can think of now e.g. too many PRC calls, too many async calls
> > processing, the sub query will return a table which might be (too) big,
> and
> > might cause performance issues. I would suggest preparing some use cases
> > and running some performance tests to check it. These are my concerns
> about
> > using async calls with lateral join and I'd like to share with you, happy
> > to discuss with you and hear different opinions, hopefully the
> > discussion could help me understand it more deeply. Please correct me if
> I
> > am wrong.
> >
> > Best regards,
> > Jing
> >
> >
> > On Thu, Jun 8, 2023 at 7:22 AM Aitozi  wrote:
> >
> > > Hi Mason,
> > > Thanks for your input. I think if we support the user defined async
> > > table function,
> > > user will be able to use it to hold a batch data then handle it at one
> > time
> > > in the customized function.
> > >
> > > AsyncSink is meant for the sink operator. I have not figure out how to
> > > integrate in this case.
> > >
> > > Thanks,
> > > Atiozi.
> > >
> > >
> > > Mason Chen  于2023年6月8日周四 12:40写道:
> > >
> > > > Hi Aitozi,
> > > >
> > > > I think it makes sense to make it easier for SQL users to make RPCs.
> Do
> > > you
> > > > think your proposal can extend to the ability to batch data for the
> > RPC?
> > > > This is also another common strategy to increase throughp

Re: [DISCUSS] FLIP-313 Add support of User Defined AsyncTableFunction

2023-06-09 Thread Jing Ge
Hi Aitozi,

The keyRow used in this case contains all keys[1].

Best regards,
Jing

[1]
https://github.com/apache/flink/blob/191ec6ca3943d7119f14837efe112e074d815c47/flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/LookupFunction.java#L49


On Fri, Jun 9, 2023 at 3:42 PM Aitozi  wrote:

> Hi Jing,
>
>  The performance test is added to the FLIP.
>
>  As I know, The lookup join can return multi rows, it depends on
> whether  the join key
> is the primary key of the external database or not. The `lookup` [1] will
> return a collection of
> joined result, and each of them will be collected
>
>
> [1]:
>
> https://github.com/apache/flink/blob/191ec6ca3943d7119f14837efe112e074d815c47/flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/LookupFunction.java#L52
>
>
> Thanks,
> Aitozi.
>
> Jing Ge  于2023年6月9日周五 17:05写道:
>
> > Hi Aitozi,
> >
> > Thanks for the feedback. Looking forward to the performance tests.
> >
> > Afaik, lookup returns one row for each key [1] [2]. Conceptually, the
> > lookup function is used to enrich column(s) from the dimension table. If,
> > for the given key, there will be more than one row, there will be no way
> to
> > know which row will be used to enrich the key.
> >
> > [1]
> >
> >
> https://github.com/apache/flink/blob/191ec6ca3943d7119f14837efe112e074d815c47/flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/LookupFunction.java#L49
> > [2]
> >
> >
> https://github.com/apache/flink/blob/191ec6ca3943d7119f14837efe112e074d815c47/flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/TableFunction.java#L196
> >
> > Best regards,
> > Jing
> >
> > On Fri, Jun 9, 2023 at 5:18 AM Aitozi  wrote:
> >
> > > Hi Jing
> > > Thanks for your good questions. I have updated the example to the
> > FLIP.
> > >
> > > > Only one row for each lookup
> > > lookup can also return multi rows, based on the query result. [1]
> > >
> > > [1]:
> > >
> > >
> >
> https://github.com/apache/flink/blob/191ec6ca3943d7119f14837efe112e074d815c47/flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/LookupFunction.java#L56
> > >
> > > > If we use async calls with lateral join, my gut feeling is
> > > that we might have many more async calls than lookup join. I am not
> > really
> > > sure if we will be facing potential issues in this case or not.
> > >
> > > IMO, the work pattern is similar to the lookup function, for each row
> > from
> > > the left table,
> > > it will evaluate the eval method once, so the async call numbers will
> not
> > > change.
> > > and the maximum calls in flight is limited by the Async operators
> buffer
> > > capacity
> > > which will be controlled by the option.
> > >
> > > BTW, for the naming of these option, I updated the FLIP about this you
> > can
> > > refer to
> > > the section of "ConfigOption" and "Rejected Alternatives"
> > >
> > > In the end, for the performance evaluation, I'd like to do some tests
> and
> > > will update it to the FLIP doc
> > >
> > > Thanks,
> > > Aitozi.
> > >
> > >
> > > Jing Ge  于2023年6月9日周五 07:23写道:
> > >
> > > > Hi Aitozi,
> > > >
> > > > Thanks for the clarification. The code example looks interesting. I
> > would
> > > > suggest adding them into the FLIP. The description with code examples
> > > will
> > > > help readers understand the motivation and how to use it. Afaiac, it
> > is a
> > > > valid feature for Flink users.
> > > >
> > > > As we knew, lookup join is based on temporal join, i.e. FOR
> SYSTEM_TIME
> > > AS
> > > > OF which is also used in your code example. Temporal join performs
> the
> > > > lookup based on the processing time match. Only one row for each
> > > > lookup(afaiu, I need to check the source code to double confirm) will
> > > > return for further enrichment. One the other hand, lateral join will
> > have
> > > > sub-queries correlated with every individual value of the reference
> > table
> > > > from the preceding part of the query and each sub query will return
> > > > multiple rows. If we use async calls with lateral join, my gut
> feeling
> > is
> > 

Re: [DISCUSS] FLIP-307: Flink connector Redshift

2023-06-09 Thread Jing Ge
Hi Samrat,

The FLIP looks good, thanks!

Best regards,
Jing


On Tue, Jun 6, 2023 at 8:16 PM Samrat Deb  wrote:

> Hi Jing,
>
> >  I would suggest adding that information into the
> FLIP.
>
> Updated now, please review the new version of flip whenever time.
>
> > +1 Looking forward to your PR :-)
> I will request for your review once m ready with PR :-)
>
> Bests,
> Samrat
>
> On Tue, Jun 6, 2023 at 11:43 PM Samrat Deb  wrote:
>
> > Hi Martijn,
> >
> > > If I understand this correctly, the Redshift sink
> > would not be able to support exactly-once, is that correct?
> >
> > As I delve deeper into the study of Redshift's capabilities, I have
> > discovered that it does support "merge into" operations [1] and some
> > merge into examples [2].
> > This opens up the possibility of implementing exactly-once semantics with
> > the connector.
> > However, I believe it would be prudent to start with a more focused scope
> > for the initial phase of implementation and defer the exact-once support
> > for subsequent iterations.
> >
> > Before finalizing the approach, I would greatly appreciate your thoughts
> > and suggestions on this matter.
> > Should we prioritize the initial implementation without exactly-once
> > support, or would you advise incorporating it right from the start?
> > Your insights and experiences would be immensely valuable in making this
> > decision.
> >
> >
> > [1]
> >
> https://docs.aws.amazon.com/redshift/latest/dg/t_updating-inserting-using-staging-tables-.html
> > [2] https://docs.aws.amazon.com/redshift/latest/dg/merge-examples.html
> >
> > Bests,
> > Samrat
> >
> > On Mon, Jun 5, 2023 at 7:09 PM Jing Ge 
> wrote:
> >
> >> Hi Samrat,
> >>
> >> Thanks for the feedback. I would suggest adding that information into
> the
> >> FLIP.
> >>
> >> +1 Looking forward to your PR :-)
> >>
> >> Best regards,
> >> Jing
> >>
> >> On Sat, Jun 3, 2023 at 9:19 PM Samrat Deb 
> wrote:
> >>
> >> > Hi Jing Ge,
> >> >
> >> > >>> Do you already have any prototype? I'd like to join the reviews.
> >> > The prototype is in progress. I will raise the dedicated PR for review
> >> soon
> >> > also notify in this thread as well .
> >> >
> >> > >>> Will the Redshift connector provide additional features
> >> > beyond the mediator/wrapper of the jdbc connector?
> >> >
> >> > Here are the additional features that the Flink connector for AWS
> >> Redshift
> >> > can provide on top of using JDBC:
> >> >
> >> > 1. Integration with AWS Redshift Workload Management (WLM): AWS
> Redshift
> >> > allows you to configure WLM[1] to manage query prioritization and
> >> resource
> >> > allocation. The Flink connector for Redshift will be agnostic to the
> >> > configured WLM and utilize it for scaling in and out for the sink.
> This
> >> > means that the connector can leverage the WLM capabilities of Redshift
> >> to
> >> > optimize the execution of queries and allocate resources efficiently
> >> based
> >> > on your defined workload priorities.
> >> >
> >> > 2. Abstraction of AWS Redshift Quotas and Limits: AWS Redshift imposes
> >> > certain quotas and limits[2] on various aspects such as the number of
> >> > clusters, concurrent connections, queries per second, etc. The Flink
> >> > connector for Redshift will provide an abstraction layer for users,
> >> > allowing them to work with Redshift without having to worry about
> these
> >> > specific limits. The connector will handle the management of
> connections
> >> > and queries within the defined quotas and limits, abstracting away the
> >> > complexity and ensuring compliance with Redshift's restrictions.
> >> >
> >> > These features aim to simplify the integration of Flink with AWS
> >> Redshift,
> >> > providing optimized resource utilization and transparent handling of
> >> > Redshift-specific limitations.
> >> >
> >> > Bests,
> >> > Samrat
> >> >
> >> > [1]
> >> >
> >> >
> >>
> https://docs.aws.amazon.com/redshift/latest/dg/cm-c-implementing-workload-management.html
> >> > [2]
> >> >
> >> >
> >>

Re: [VOTE] FLIP-305: Support atomic for CREATE TABLE AS SELECT(CTAS) statement

2023-06-12 Thread Jing Ge
+1(binding) Thanks!

Best regards,
Jing

On Mon, Jun 12, 2023 at 12:01 PM yuxia  wrote:

> +1 (binding)
> Thanks Mang driving it.
>
> Best regards,
> Yuxia
>
> - 原始邮件 -
> 发件人: "zhangmang1" 
> 收件人: "dev" 
> 发送时间: 星期一, 2023年 6 月 12日 下午 5:31:10
> 主题: [VOTE] FLIP-305: Support atomic for CREATE TABLE AS SELECT(CTAS)
> statement
>
> Hi everyone,
>
> Thanks for all the feedback about FLIP-305: Support atomic for CREATE
> TABLE AS SELECT(CTAS) statement[1].
> [2] is the discussion thread.
>
> I'd like to start a vote for it. The vote will be open for at least 72
> hours (until June 15th, 10:00AM GMT) unless there is an objection or an
> insufficient number of votes.[1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-305%3A+Support+atomic+for+CREATE+TABLE+AS+SELECT%28CTAS%29+statement
> [2]https://lists.apache.org/thread/n6nsvbwhs5kwlj5kjgv24by2tk5mh9xd
>
>
>
>
>
>
>
> --
>
> Best regards,
> Mang Zhang
>


Re: [DISCUSS] FLIP-313 Add support of User Defined AsyncTableFunction

2023-06-12 Thread Jing Ge
Hi Aitozi,

Which key will be used for lookup is not an issue, only one row will be
required for each key in order to enrich it. True, it depends on the
implementation whether multiple rows or single row for each key will be
returned. However, for the lookup & enrichment scenario, one row/key is
recommended, otherwise, like I mentioned previously, enrichment won't work.

I am a little bit concerned about returning a big table for each key, since
it will take the async call longer to return and need more memory. The
performance tests should cover this scenario. This is not a blocking issue
for this FLIP.

Best regards,
Jing

On Sat, Jun 10, 2023 at 4:11 AM Aitozi  wrote:

> Hi Jing,
> I means the join key is not necessary to be the primary key or unique
> index of the database.
> In this situation, we may queried out multi rows for one join key. I think
> that's why the
> LookupFunction#lookup will return a collection of RowData.
>
> BTW, I think the behavior of lookup join will not affect the semantic of
> the async udtf.
> We use the Async TableFunction here and the table function can collect
> multiple rows.
>
> Thanks,
> Atiozi.
>
>
>
> Jing Ge  于2023年6月10日周六 00:15写道:
>
> > Hi Aitozi,
> >
> > The keyRow used in this case contains all keys[1].
> >
> > Best regards,
> > Jing
> >
> > [1]
> >
> >
> https://github.com/apache/flink/blob/191ec6ca3943d7119f14837efe112e074d815c47/flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/LookupFunction.java#L49
> >
> >
> > On Fri, Jun 9, 2023 at 3:42 PM Aitozi  wrote:
> >
> > > Hi Jing,
> > >
> > >  The performance test is added to the FLIP.
> > >
> > >  As I know, The lookup join can return multi rows, it depends on
> > > whether  the join key
> > > is the primary key of the external database or not. The `lookup` [1]
> will
> > > return a collection of
> > > joined result, and each of them will be collected
> > >
> > >
> > > [1]:
> > >
> > >
> >
> https://github.com/apache/flink/blob/191ec6ca3943d7119f14837efe112e074d815c47/flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/LookupFunction.java#L52
> > >
> > >
> > > Thanks,
> > > Aitozi.
> > >
> > > Jing Ge  于2023年6月9日周五 17:05写道:
> > >
> > > > Hi Aitozi,
> > > >
> > > > Thanks for the feedback. Looking forward to the performance tests.
> > > >
> > > > Afaik, lookup returns one row for each key [1] [2]. Conceptually, the
> > > > lookup function is used to enrich column(s) from the dimension table.
> > If,
> > > > for the given key, there will be more than one row, there will be no
> > way
> > > to
> > > > know which row will be used to enrich the key.
> > > >
> > > > [1]
> > > >
> > > >
> > >
> >
> https://github.com/apache/flink/blob/191ec6ca3943d7119f14837efe112e074d815c47/flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/LookupFunction.java#L49
> > > > [2]
> > > >
> > > >
> > >
> >
> https://github.com/apache/flink/blob/191ec6ca3943d7119f14837efe112e074d815c47/flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/TableFunction.java#L196
> > > >
> > > > Best regards,
> > > > Jing
> > > >
> > > > On Fri, Jun 9, 2023 at 5:18 AM Aitozi  wrote:
> > > >
> > > > > Hi Jing
> > > > > Thanks for your good questions. I have updated the example to
> the
> > > > FLIP.
> > > > >
> > > > > > Only one row for each lookup
> > > > > lookup can also return multi rows, based on the query result. [1]
> > > > >
> > > > > [1]:
> > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/flink/blob/191ec6ca3943d7119f14837efe112e074d815c47/flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/LookupFunction.java#L56
> > > > >
> > > > > > If we use async calls with lateral join, my gut feeling is
> > > > > that we might have many more async calls than lookup join. I am not
> > > > really
> > > > > sure if we will be facing potential issues in this case or not.
> > > > >
> > > > > IMO, the work pattern is similar to the lookup function, for each
> row

Re: [DISCUSS] FLIP-313 Add support of User Defined AsyncTableFunction

2023-06-12 Thread Jing Ge
Hi Aitozi,

Thanks for taking care of that part. I have no other concern.

Best regards,
Jing


On Mon, Jun 12, 2023 at 5:38 PM Aitozi  wrote:

> BTW, If there are no other more blocking issue / comments, I would like to
> start a VOTE in another thread this wednesday 6.14
>
> Thanks,
> Aitozi.
>
> Aitozi  于2023年6月12日周一 23:34写道:
>
> > Hi, Jing,
> > Thanks for your explanation. I get your point now.
> >
> > For the performance part, I think it's a good idea to run with returning
> a
> > big table case, the memory consumption
> > should be a point to be taken care about. Because in the ordered mode,
> the
> > head element in buffer may affect the
> > total memory consumption.
> >
> >
> > Thanks,
> > Aitozi.
> >
> >
> >
> > Jing Ge  于2023年6月12日周一 20:28写道:
> >
> >> Hi Aitozi,
> >>
> >> Which key will be used for lookup is not an issue, only one row will be
> >> required for each key in order to enrich it. True, it depends on the
> >> implementation whether multiple rows or single row for each key will be
> >> returned. However, for the lookup & enrichment scenario, one row/key is
> >> recommended, otherwise, like I mentioned previously, enrichment won't
> >> work.
> >>
> >> I am a little bit concerned about returning a big table for each key,
> >> since
> >> it will take the async call longer to return and need more memory. The
> >> performance tests should cover this scenario. This is not a blocking
> issue
> >> for this FLIP.
> >>
> >> Best regards,
> >> Jing
> >>
> >> On Sat, Jun 10, 2023 at 4:11 AM Aitozi  wrote:
> >>
> >> > Hi Jing,
> >> > I means the join key is not necessary to be the primary key or
> >> unique
> >> > index of the database.
> >> > In this situation, we may queried out multi rows for one join key. I
> >> think
> >> > that's why the
> >> > LookupFunction#lookup will return a collection of RowData.
> >> >
> >> > BTW, I think the behavior of lookup join will not affect the semantic
> of
> >> > the async udtf.
> >> > We use the Async TableFunction here and the table function can collect
> >> > multiple rows.
> >> >
> >> > Thanks,
> >> > Atiozi.
> >> >
> >> >
> >> >
> >> > Jing Ge  于2023年6月10日周六 00:15写道:
> >> >
> >> > > Hi Aitozi,
> >> > >
> >> > > The keyRow used in this case contains all keys[1].
> >> > >
> >> > > Best regards,
> >> > > Jing
> >> > >
> >> > > [1]
> >> > >
> >> > >
> >> >
> >>
> https://github.com/apache/flink/blob/191ec6ca3943d7119f14837efe112e074d815c47/flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/LookupFunction.java#L49
> >> > >
> >> > >
> >> > > On Fri, Jun 9, 2023 at 3:42 PM Aitozi  wrote:
> >> > >
> >> > > > Hi Jing,
> >> > > >
> >> > > >  The performance test is added to the FLIP.
> >> > > >
> >> > > >  As I know, The lookup join can return multi rows, it depends
> on
> >> > > > whether  the join key
> >> > > > is the primary key of the external database or not. The `lookup`
> [1]
> >> > will
> >> > > > return a collection of
> >> > > > joined result, and each of them will be collected
> >> > > >
> >> > > >
> >> > > > [1]:
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> https://github.com/apache/flink/blob/191ec6ca3943d7119f14837efe112e074d815c47/flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/LookupFunction.java#L52
> >> > > >
> >> > > >
> >> > > > Thanks,
> >> > > > Aitozi.
> >> > > >
> >> > > > Jing Ge  于2023年6月9日周五 17:05写道:
> >> > > >
> >> > > > > Hi Aitozi,
> >> > > > >
> >> > > > > Thanks for the feedback. Looking forward to the performance
> tests.
> >> > > > >
> >> > > > > Afaik, lookup returns one row for each key [1] [2].
> Conceptually,
&

Re: [DISCUSS] FLIP-321: Introduce an API deprecation process

2023-06-13 Thread Jing Ge
Hi Becket,

Thanks for driving this important topic! There were many discussions
previously that ended up with waiting up for a clear API deprecation
process definition. This FLIP will help a lot.

I'd like to ask some questions to understand your thoughts.

Speaking of the FLIP,

*"Always add a "Since X.X.X" comment to indicate when was a class /
interface / method marked as deprecated."*
 Could you describe it with a code example? Do you mean Java comments?

*"At least 1 patch release for the affected minor release for
Experimental APIs"*
The rule is absolutely right. However, afaiac, deprecation is different as
modification. As a user/dev, I would appreciate, if I do not need to do any
migration work for any deprecated API between patch releases upgrade. BTW,
if experimental APIs are allowed to change between patches, could we just
change them instead of marking them as deprecated and create new ones to
replace them?

One major issue we have, afaiu, is caused by the lack of housekeeping/house
cleaning, there are many APIs that were marked as deprecated a few years
ago and still don't get removed. Some APIs should be easy to remove and
others will need some more clear rules, like the issue discussed at [1].

Some common questions could be:

1. How to make sure the new APIs cover all functionality, i.e. backward
compatible, before removing the deprecated APIs? Since the
functionalities could only be built with the new APIs iteratively, there
will be a while (might be longer than the migration period) that the new
APIs are not backward compatible with the deprecated ones.
2. Is it allowed to remove the deprecated APIs after the defined migration
period expires while the new APis are still not backward compatible?
3. For the case of core API upgrade with downstream implementations, e.g.
connectors, What is the feasible deprecation strategy? Option1 bottom-up:
make sure the downstream implementation is backward compatible before
removing the deprecated core APIs. Option2 top-down: once the downstream
implementation of new APIs works fine, we can remove the deprecated core
APIs after the migration period expires. The implementation of the
deprecated APIs will not get any further update in upcoming releases(it has
been removed). There might be some missing features in the downstream
implementation of new APIs compared to the old implementation. Both options
have their own pros and cons.


Best regards,
Jing


[1] https://lists.apache.org/thread/m3o48c2d8j9g5t9s89hqs6qvr924s71o


On Mon, Jun 12, 2023 at 6:31 PM Stefan Richter
 wrote:

> Hi,
>
> Thanks a lot for bringing up this topic and for the initial proposal. As
> more and more people are looking into running Flink as a continuous service
> this discussion is becoming very relevant.
>
> What I would like to see is a clearer definition for what we understand by
> stability and compatibility. Our current policy only talks about being able
> to “compile” and “run” with a different version. As far as I can see, there
> is no guarantee about the stability of observable behavior. I believe it’s
> important for the community to include this important aspect in the
> guarantees that we give as our policy.
>
> For all changes that we do to the stable parts of the API, we should also
> consider how easy or difficult different types of changes would be for
> running Flink as a service with continuous delivery. For example,
> introducing a new interface to evolve the methods would make it easier to
> write adapter code than changing method signatures in-place on the existing
> interface. Those concerns should be considered in our process for evolving
> interfaces.
>
> Best,
> Stefan
>
>
>
>   
> Stefan Richter
> Principal Engineer II
>
> Follow us:  <
> https://www.confluent.io/blog?utm_source=footer&utm_medium=email&utm_campaign=ch.email-signature_type.community_content.blog>
> 
>
>
>
> > On 11. Jun 2023, at 14:30, Becket Qin  wrote:
> >
> > Hi folks,
> >
> > As one of the release 2.0 efforts, the release managers were discussing
> our
> > API lifecycle policies. There have been FLIP-196[1] and FLIP-197[2] that
> > are relevant to this topic. These two FLIPs defined the stability
> guarantee
> > of the programming APIs with various different stability annotations, and
> > the promotion process. A recap of the conclusion is following:
> >
> > Stability:
> > @Internal API: can change between major/minor/patch releases.
> > @Experimental API: can change between major/minor/patch releases.
> > @PublicEvolving API: can change between major/minor releases.
> > @Public API: can only change between major releases.
> >
> > Promotion:
> > An @Experimental API should be promoted to @PublicEvolving after two
> > releases, and a @PublicEvolving API should be promoted to @Public API
> after
> > two releases, unless there is a documented reason not to do so.
> >
> > One thing not mentioned in these two FLIPs is the API dep

Re: [DISCUSS] FLIP-321: Introduce an API deprecation process

2023-06-13 Thread Jing Ge
> This is by design. Most of these are @Public APIs that we had to carry
> around until Flink 2.0, because that was the initial guarantee that we
> gave people.
>

True, I knew @Public APIs could not be removed before the next major
release. I meant house cleaning without violation of these annotations'
design concept. i.e especially cleaning up for @PublicEvolving APIs since
they are customer-facing. Regular cleaning up with all other @Experimental
and @Internal APIs would be even better, if there might be some APIs marked
as @deprecated.

Best regards,
Jing



On Tue, Jun 13, 2023 at 4:25 PM Chesnay Schepler  wrote:

> On 13/06/2023 12:50, Jing Ge wrote:
> > One major issue we have, afaiu, is caused by the lack of
> housekeeping/house
> > cleaning, there are many APIs that were marked as deprecated a few years
> > ago and still don't get removed. Some APIs should be easy to remove and
> > others will need some more clear rules, like the issue discussed at [1].
>
This is by design. Most of these are @Public APIs that we had to carry
> around until Flink 2.0, because that was the initial guarantee that we
> gave people.


>
> As for the FLIP, I like the idea of explicitly writing down a
> deprecation period for APIs, particularly PublicEvolving ones.
> For Experimental I don't think it'd be a problem if we could change them
> right away,
> but looking back a bit I don't think it hurts us to also enforce some
> deprecation period.
> 1 release for both of these sound fine to me.
>
>
> My major point of contention is the removal of Public APIs between minor
> versions.
> This to me would a major setback towards a simpler upgrade path for users.
> If these can be removed in minor versions than what even is a major
> release?
> The very definition we have for Public APIs is that they stick around
> until the next major one.
> Any rule that theoretically allows for breaking changes in Public API in
> every minor release is in my opinion not a viable option.
>
>
> The "carry recent Public APIs forward into the next major release" thing
> seems to presume a linear release history (aka, if 2.0 is released after
> 1.20, then there will be no 1.21), which I doubt will be the case. The
> idea behind it is good, but I'd say the right conclusion would be to not
> make that API public if we know a new major release hits in 3 months and
> is about to modify it. With a regular schedule for major releases this
> wouldn't be difficult to do.
>


Re: [DISCUSS] FLIP-311: Support Call Stored Procedure

2023-06-13 Thread Jing Ge
Hi yuxia,

Thanks for your proposal and sorry for the late reply. The FLIP is in good
shape. If I am not mistaken, Everything, that a stored procedure could do,
could also be done by a Flink job. The current stored procedure design is
to empower Catalogs to provide users some commonly used logics/functions
centrally and out-of-the-box, i.e. DRY. Is that correct?

Best regards,
Jing

On Thu, Jun 8, 2023 at 10:32 AM Jark Wu  wrote:

> Thank you for the proposal, yuxia! The FLIP looks good to me.
>
> Best,
> Jark
>
> > 2023年6月8日 11:39,yuxia  写道:
> >
> > Hi, all.
> > Thanks everyone for the valuable input. If there are are no further
> concerns about this FLIP[1], I would like to start voting next monday
> (6/12).
> >
> > [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-311%3A+Support+Call+Stored+Procedure
> >
> >
> > Best regards,
> > Yuxia
> >
> > - 原始邮件 -
> > 发件人: "Martijn Visser" 
> > 收件人: "dev" 
> > 发送时间: 星期二, 2023年 6 月 06日 下午 3:57:56
> > 主题: Re: [DISCUSS] FLIP-311: Support Call Stored Procedure
> >
> > Hi Yuxia,
> >
> > Thanks for the clarification. I would be +0 overall, because I think
> > without actually allowing creation/customization of stored procedures,
> the
> > value for the majority of Flink users will be minimal.
> >
> > Best regards,
> >
> > Martijn
> >
> > On Tue, Jun 6, 2023 at 3:52 AM yuxia 
> wrote:
> >
> >> Hi, Martijn.
> >> Thanks for you feedback.
> >> 1: In this FLIP we don't intend to allow users to customize their own
> >> stored procedure for we don't want to expose too much to users too
> early as
> >> the FLIP said.
> >> The procedures are supposed to be provided only by Catalog. Catalog devs
> >> can write their build-in procedures, and return the procedure in method
> >> Catalog.getProcedure(ObjectPath procedurePath);
> >> So, there won't be SQL syntax to create/save a stored procedure in this
> >> FLIP. If we find we do need it, we can propse the SQL syntax to create a
> >> stored procedure in another dedicated FLIP.
> >>
> >> 2: The syntax `Call procedure_name(xx)` proposed in this FLIP is the
> >> default syntax in Calcite for call stored procedures. Actaully, we don't
> >> need to do any modifcation in flink-sql-parser module for syntax of
> calling
> >> a procedure. MySQL[1], Postgres[2], Oracle[3] also use the syntax to
> call a
> >> stored procedure.
> >>
> >>
> >> [1] https://dev.mysql.com/doc/refman/8.0/en/call.html
> >> [2] https://www.postgresql.org/docs/15/sql-call.html
> >> [3] https://docs.oracle.com/javadb/10.8.3.0/ref/rrefcallprocedure.html
> >>
> >> Best regards,
> >> Yuxia
> >>
> >> - 原始邮件 -
> >> 发件人: "Martijn Visser" 
> >> 收件人: "dev" 
> >> 发送时间: 星期一, 2023年 6 月 05日 下午 8:35:44
> >> 主题: Re: [DISCUSS] FLIP-311: Support Call Stored Procedure
> >>
> >> Hi Yuxia,
> >>
> >> Thanks for the FLIP. I have a couple of questions:
> >>
> >> 1. The syntax talks about how to CALL or SHOW the available stored
> >> procedures, but not on how to create one. Will there not be a SQL
> syntax to
> >> create/save a stored procedure?
> >> 2. Is there a default syntax in Calcite for stored procedures? What do
> >> other databases do, do they use CALL/SHOW or something like EXEC, USE?
> >>
> >> Best regards,
> >>
> >> Martijn
> >>
> >> On Mon, Jun 5, 2023 at 3:23 AM yuxia 
> wrote:
> >>
> >>> Hi, Jane.
> >>> Thanks for you input. I think we can add the auxiliary command show
> >>> procedures in this FLIP.
> >>> Following the syntax for show functions proposed in FLIP-297.
> >>> The syntax will be
> >>> SHOW PROCEDURES [ ( FROM | IN ) [catalog_name.]database_name ] [ [NOT]
> >>> (LIKE | ILIKE)  ].
> >>> I have updated to this FLIP.
> >>>
> >>> The other auxiliary commands maybe not suitable currently or need a
> >>> further/dedicated dicussion. Let's keep this FLIP focus.
> >>>
> >>> [1]
> >>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-297%3A+Improve+Auxiliary+Sql+Statements
> >>>
> >>> Best regards,
> >>> Yuxia
> >>>
> >>> - 原始邮件 -
> >>> 发件人: "Jane Chan" 
> >>> 收件人: "dev" 
> >>> 发送时间: 星期六, 2023年 6 月 03日 下午 7:04:39
> >>> 主题: Re: [DISCUSS] FLIP-311: Support Call Stored Procedure
> >>>
> >>> Hi Yuxia,
> >>>
> >>> Thanks for bringing this to the discussion. The call procedure is a
> >> widely
> >>> used feature and will be very useful for users.
> >>>
> >>> I just have one question regarding the usage. The FLIP mentioned that
> >>>
> >>> Flink will allow connector developers to develop their own built-in
> >> stored
>  procedures, and then enables users to call these predefiend stored
>  procedures.
> 
> >>> In this FLIP, we don't intend to allow users to customize their own
> >> stored
>  procedure  for we don't want to expose too much to users too early.
> >>>
> >>>
> >>> If I understand correctly, we might need to provide some auxiliary
> >> commands
> >>> to inform users what built-in procedures are provided and how to use
> >> them.
> >>> For example, Snowflake provides commands like [1] [2], and MySQL
> provides

Re: Re: [VOTE] FLIP-311: Support Call Stored Procedure

2023-06-13 Thread Jing Ge
+1(binding)

Best Regards,
Jing

On Tue, Jun 13, 2023 at 9:03 AM Mang Zhang  wrote:

> +1 (no-binding)
>
>
>
>
> --
>
> Best regards,
> Mang Zhang
>
>
>
>
>
> 在 2023-06-13 13:19:31,"Lincoln Lee"  写道:
> >+1 (binding)
> >
> >Best,
> >Lincoln Lee
> >
> >
> >Jingsong Li  于2023年6月13日周二 10:07写道:
> >
> >> +1
> >>
> >> On Mon, Jun 12, 2023 at 10:32 PM Rui Fan <1996fan...@gmail.com> wrote:
> >> >
> >> > +1 (binding)
> >> >
> >> > Best,
> >> > Rui Fan
> >> >
> >> > On Mon, Jun 12, 2023 at 22:20 Benchao Li 
> wrote:
> >> >
> >> > > +1 (binding)
> >> > >
> >> > > yuxia  于2023年6月12日周一 17:58写道:
> >> > >
> >> > > > Hi everyone,
> >> > > > Thanks for all the feedback about FLIP-311: Support Call Stored
> >> > > > Procedure[1]. Based on the discussion [2], we have come to a
> >> consensus,
> >> > > so
> >> > > > I would like to start a vote.
> >> > > > The vote will be open for at least 72 hours (until June 15th,
> 10:00AM
> >> > > GMT)
> >> > > > unless there is an objection or an insufficient number of votes.
> >> > > >
> >> > > >
> >> > > > [1]
> >> > > >
> >> > >
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-311%3A+Support+Call+Stored+Procedure
> >> > > > [2]
> https://lists.apache.org/thread/k6s50gcgznon9v1oylyh396gb5kgrwmd
> >> > > >
> >> > > > Best regards,
> >> > > > Yuxia
> >> > > >
> >> > >
> >> > >
> >> > > --
> >> > >
> >> > > Best,
> >> > > Benchao Li
> >> > >
> >>
>


Re: [VOTE] FLIP-294: Support Customized Catalog Modification Listener

2023-06-14 Thread Jing Ge
+1 (binding)

Best Regards,
Jing

On Wed, Jun 14, 2023 at 4:07 PM Benchao Li  wrote:

> +1 (binding)
>
> Shammon FY  于2023年6月14日周三 19:52写道:
>
> > Hi all:
> >
> > Thanks for all the feedback for FLIP-294: Support Customized Catalog
> > Modification Listener [1]. I would like to start a vote for it according
> to
> > the discussion in thread [2].
> >
> > The vote will be open for at least 72 hours(excluding weekends, until
> June
> > 19, 19:00 PM GMT) unless there is an objection or an insufficient number
> of
> > votes.
> >
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-294%3A+Support+Customized+Catalog+Modification+Listener
> > [2] https://lists.apache.org/thread/185mbcwnpokfop4xcb22r9bgfp2m68fx
> >
> >
> > Best,
> > Shammon FY
> >
>
>
> --
>
> Best,
> Benchao Li
>


Re: [VOTE] FLIP-295: Support lazy initialization of catalogs and persistence of catalog configurations

2023-06-14 Thread Jing Ge
+1 (binding)

Best Regards,
Jing

On Wed, Jun 14, 2023 at 3:28 PM Rui Fan <1996fan...@gmail.com> wrote:

> +1(binding)
>
> Best,
> Rui Fan
>
> On Wed, Jun 14, 2023 at 16:24 Hang Ruan  wrote:
>
> > +1 (non-binding)
> >
> > Thanks for Feng driving it.
> >
> > Best,
> > Hang
> >
> > Feng Jin  于2023年6月14日周三 10:36写道:
> >
> > > Hi everyone
> > >
> > > Thanks for all the feedback about the FLIP-295: Support lazy
> > initialization
> > > of catalogs and persistence of catalog configurations[1].
> > > [2] is the discussion thread.
> > >
> > >
> > > I'd like to start a vote for it. The vote will be open for at least 72
> > > hours(excluding weekends,until June 19, 10:00AM GMT) unless there is an
> > > objection or an insufficient number of votes.
> > >
> > >
> > > [1]
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-295%3A+Support+lazy+initialization+of+catalogs+and+persistence+of+catalog+configurations
> > > [2]https://lists.apache.org/thread/dcwgv0gmngqt40fl3694km53pykocn5s
> > >
> > >
> > > Best,
> > > Feng
> > >
> >
>


Re: [VOTE] FLIP-246: Dynamic Kafka Source (originally Multi Cluster Kafka Source)

2023-06-16 Thread Jing Ge
+1 (binding)

Best regards,
Jing

On Thu, Jun 15, 2023 at 7:55 PM Mason Chen  wrote:

> Hi all,
>
> Thank you to everyone for the feedback on FLIP-246 [1]. Based on the
> discussion thread [2], we have come to a consensus on the design and are
> ready to take a vote to contribute this to Flink.
>
> This voting thread will be open for at least 72 hours (excluding weekends,
> until June 20th 10:00AM PST) unless there is an objection or an
> insufficient number of votes.
>
> (Optional) If you have an opinion on the naming of the connector, please
> include it in your vote:
> 1. DynamicKafkaSource
> 2. MultiClusterKafkaSource
> 3. DiscoveringKafkaSource
>
> [1]
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=217389320
> [2] https://lists.apache.org/thread/vz7nw5qzvmxwnpktnofc9p13s1dzqm6z
>
> Best,
> Mason
>


Re: [VOTE] FLIP-287: Extend Sink#InitContext to expose TypeSerializer, ObjectReuse and JobID

2023-06-16 Thread Jing Ge
+1(binding)

Best Regards,
Jing

On Fri, Jun 16, 2023 at 10:10 AM Lijie Wang 
wrote:

> +1 (binding)
>
> Thanks for driving it, Joao.
>
> Best,
> Lijie
>
> Joao Boto  于2023年6月16日周五 15:53写道:
>
> > Hi all,
> >
> > Thank you to everyone for the feedback on FLIP-287[1]. Based on the
> > discussion thread [2], we have come to a consensus on the design and are
> > ready to take a vote to contribute this to Flink.
> >
> > I'd like to start a vote for it. The vote will be open for at least 72
> > hours(excluding weekends, unless there is an objection or an insufficient
> > number of votes.
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=240880853
> > [2]https://lists.apache.org/thread/wb3myhqsdz81h08ygwx057mkn1hc3s8f
> >
> >
> > Best,
> > Joao Boto
> >
>


Re: [DISCUSS] FLIP-321: Introduce an API deprecation process

2023-06-17 Thread Jing Ge
ule to decide when to remove deprecated APIs without
caring of anything else like functionality compatibility. If the migration
period described in this FLIP is only the minimum time, I think we still
have the house cleaning issue unsolved. Minimum means deprecated APIs can
not be removed before the migration period expires. The issue I was aware
of is when/how to remove APIs after the migration period expired, e.g.
PublicEvolving APIs that have been marked as deprecated years ago.


Best regards,
Jing

[1] https://lists.apache.org/thread/m3o48c2d8j9g5t9s89hqs6qvr924s71o

On Wed, Jun 14, 2023 at 9:56 AM Becket Qin  wrote:

> Hi Jing,
>
> Thanks for the feedback. Please see the answers to your questions below:
>
> *"Always add a "Since X.X.X" comment to indicate when was a class /
> > interface / method marked as deprecated."*
> >  Could you describe it with a code example? Do you mean Java comments?
>
> It is just a comment such as "Since 1.18. Use X
> <
> https://kafka.apache.org/31/javadoc/org/apache/kafka/clients/admin/Admin.html#incrementalAlterConfigs(java.util.Map)
> >XX
> instead.". And we can then look it up in the deprecated list[1] in each
> release and see which method should / can be deprecated.
>
> *"At least 1 patch release for the affected minor release for
> > Experimental APIs"*
> > The rule is absolutely right. However, afaiac, deprecation is different
> as
> > modification. As a user/dev, I would appreciate, if I do not need to do
> any
> > migration work for any deprecated API between patch releases upgrade.
> BTW,
> > if experimental APIs are allowed to change between patches, could we just
> > change them instead of marking them as deprecated and create new ones to
> > replace them?
>
> Deprecating an API is just a more elegant way of replacing an API with a
> new one. The only difference between the two is whether the old API is kept
> and coexists with the new API for some releases or not. For end users,
> deprecation-then-remove is much more friendly than direct replacement.
>
> 1. How to make sure the new APIs cover all functionality, i.e. backward
> > compatible, before removing the deprecated APIs? Since the
> > functionalities could only be built with the new APIs iteratively, there
> > will be a while (might be longer than the migration period) that the new
> > APIs are not backward compatible with the deprecated ones.
>
>

> This is orthogonal to the deprecation process, and may not even be required
> in some cases if the function changes by itself. But in general, this
> relies on the developer to decide. A simple test on readiness is to see if
> all the UT / IT cases written with the old API can be migrated to the new
> one and still work.  If the new API is not ready, we probably should not
> deprecate the old one to begin with.
>


>
> 2. Is it allowed to remove the deprecated APIs after the defined migration
> > period expires while the new APis are still not backward compatible?
>
>

> By "backwards compatible", do you mean functionally equivalent? If the new
> APIs are designed to be not backwards compatible, then removing the
> deprecated source code is definitely allowed. If we don't think the new API
> is ready to take over the place for the old one, then we should wait. The
> migration period is the minimum time we have to wait before removing the
> source code. A longer migration period is OK.
>
>

> 3. For the case of core API upgrade with downstream implementations, e.g.
> > connectors, What is the feasible deprecation strategy? Option1 bottom-up:
> > make sure the downstream implementation is backward compatible before
> > removing the deprecated core APIs. Option2 top-down: once the downstream
> > implementation of new APIs works fine, we can remove the deprecated core
> > APIs after the migration period expires. The implementation of the
> > deprecated APIs will not get any further update in upcoming releases(it
> has
> > been removed). There might be some missing features in the downstream
> > implementation of new APIs compared to the old implementation. Both
> options
> > have their own pros and cons.
>
> The downstream projects such as connectors in Flink should also follow the
> migration path we tell our users. i.e. If there is a cascading backwards
> incompatible change in the connectors due to a backwards incompatible
> change in the core, and as a result a longer migration period is required,
> then I think we should postpone the removal of source code. But in general,
> we should be able to provide the same migration period in the connectors as
> the flink core, if the connectors are upg

Re: [DISCUSS] FLIP-321: Introduce an API deprecation process

2023-06-17 Thread Jing Ge
Hi All,

The @Public -> @PublicEvolving proposed by Xintong is a great idea.
Especially, after he suggest @PublicRetired, i.e. @PublicEvolving --(2
minor release)--> @Public --> @deprecated --(1 major
release)--> @PublicRetired. It will provide a lot of flexibility without
breaking any rules we had. @Public APIs are allowed to change between major
releases. Changing annotations is acceptable and provides additional
tolerance i.e. user-friendliness, since the APIs themself are not changed.

I had similar thoughts when I was facing those issues. I want to move one
step further and suggest introducing one more annotation @Retired.

Not like the @PublicRetired which is a compromise of downgrading @Public to
@PublicEvolving. As I mentioned earlier in my reply, Java standard
@deprecated should be used in the early stage of the deprecation process
and doesn't really meet our requirement. Since Java does not allow us to
extend annotation, I think it would be feasible to have the new @Retired to
help us monitor and manage the deprecation process, house cleaning, etc.

Some ideas could be(open for discussion):

@Retired:

1. There must be a replacement with functionality compatibility before APIs
can be marked as @Retired, i.e. DISCUSS and VOTE processes on the ML are
mandatory (a FLIP is recommended).
2. APIs marked as @Retired will be removed after 1 minor release sharply
(using ArchUnit to force it, needs further check whether it is possible).
Devs who marked them as @Retired are responsible to remove them.
3. Both @Public -> @Retired and @PublicEvolving -> @Retired are
recommended. @Experimental -> @Retired and @Internal -> @Retired could also
be used if it can increase user-friendliness or dev-friendliness, but not
mandatory.
4. Some variables will be defined in @Retired to support the deprecation
process management. Further extension is possible, since the annotation is
built by us.


Best regards,
Jing

On Fri, Jun 16, 2023 at 10:31 AM Becket Qin  wrote:

> Hi Xintong,
>
> Thanks for the explanation. Please see the replies inline below.
>
> I agree. And from my understanding, demoting a Public API is also a kind of
> > such change, just like removing one, which can only happen with major
> > version bumps. I'm not proposing to allow demoting Public APIs anytime,
> but
> > only in the case major version bumps happen before reaching the
> > 2-minor-release migration period. Actually, demoting would be a weaker
> > change compared to removing the API immediately upon major version bumps,
> > in order to keep the commitment about the 2-minor-release migration
> period.
> > If the concern is that `@Public` -> `@PublicEvolving` sounds against
> > conventions, we may introduce a new annotation if necessary, e.g.,
> > `@PublicRetiring`, to avoid confusions.
>
> As an end user who only uses Public APIs, if I don't change my code at all,
> my expectation is the following:
> 1. Upgrading from 1.x to 2.x may have issues.
> 2. If I can upgrade from 1.x to 2.x without an issue, I am fine with all
> the 2.x versions.
> Actually I think there are some dependency version resolution policies out
> there which picks the highest minor version when the dependencies pull in
> multiple minor versions of the same jar, which may be broken if we remove
> the API in minor releases.
>
> I'm not sure about this. Yes, it's completely "legal" that we bump up the
> > major version whenever a breaking change is needed. However, this also
> > weakens the value of the commitment that public APIs will stay stable
> > within the major release series, as the series can end anytime. IMHO,
> short
> > major release series are not something "make the end users happy", but
> > backdoors that allow us as the developers to make frequent breaking
> > changes. On the contrary, with the demoting approach, we can still have
> > longer major release series, while only allowing Public APIs deprecated
> at
> > the end of the previous major version to be removed in the next major
> > version.
>
> I totally agree that frequent major version bumps are not ideal, but here
> we are comparing it with a minor version bump which removes a Public API.
> So the context is that we have already decided to remove this Public API
> while keeping everything else backwards compatible. I think a major version
> bump is a commonly understood signal here, compared with a minor version
> change. From end users' perspective, for those who are not impacted, in
> this case upgrading a major version is not necessarily more involved than
> upgrading a minor version - both should be as smooth as a dependency
> version change. For those who are impacted, they will lose the Public API
> anyways and a major version bump ensures there is no surprise.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Fri, Jun 16, 2023 at 10:13 AM Xintong Song 
> wrote:
>
> > Public API is a well defined common concept, and one of its
> >> convention is that it only changes with a major version change.
> >>
> >

Re: [NOTICE] Experimental Java 17 support now available on master

2023-06-18 Thread Jing Ge
Hi Kurt,

Thanks for your contribution. I am a little bit confused about the email
title, since your PR[1] is not merged into the master yet. I guess, with
"Experimental Java 17 support", you meant it is available on your branch
which is based on the master.

If I am not mistaken, there is no vote thread of FLIP 317 on ML. Would you
like to follow the standard process[2] defined by the Flink community?
Thanks!


Best regards,
Jing

[1] https://github.com/apache/flink/pull/22660
[2]
https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals

On Sun, Jun 18, 2023 at 1:18 AM Kurt Ostfeld 
wrote:

> I built the Flink master branch and tried running this simple Flink app
> that uses a Java record:
>
>
> https://github.com/kurtostfeld/flink-kryo-upgrade-demo/blob/main/flink-record-demo/src/main/java/demo/app/Main.java
>
> It fails with the normal exception that Kryo 2.x throws when you try to
> serialize a Java record. The full stack trace is here:
> https://pastebin.com/HGhGKUWt
>
> I tried removing this line:
>
>
> https://github.com/kurtostfeld/flink-kryo-upgrade-demo/blob/main/flink-record-demo/src/main/java/demo/app/Main.java#L36
>
> and that had no impact, I got the same error.
>
> In the other thread, you said that the plan was to use PojoSerializer to
> serialize records rather than Kryo. Currently, the Flink code bases uses
> Kryo 2.x by default for generic user data types, and that will fail when
> the data type is a record or contains records. Ultimately, if Flink wants
> to fully support Java records, it seems that it has to move off of Kryo
> 2.x. PojoSerializer is part of what is basically a custom serialization
> library internal to Flink that is an alternative to Kryo. That's one
> option: move off of Kryo to a Flink-internal serialization library. The
> other two options are upgrade to the new Kryo or use a different
> serialization library.
>
> The Kryo 5.5.0 upgrade PR I submitted (
> https://github.com/apache/flink/pull/22660) with FLIP 317 (
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-317%3A+Upgrade+Kryo+from+2.24.0+to+5.5.0)
> works with records. The Flink app linked above that uses records works with
> the PR and that's what I posted to this mailing list a few weeks ago. I
> rebased the pull request on to the latest master branch and it's passing
> all tests. From my testing, it supports stateful upgrades, including
> checkpoints. If you can demonstrate a scenario where stateful upgrades
> error I can try to resolve that.


Re: [VOTE] FLIP-308: Support Time Travel

2023-06-19 Thread Jing Ge
+1(binding)

On Mon, Jun 19, 2023 at 1:57 PM Benchao Li  wrote:

> +1 (binding)
>
> Lincoln Lee  于2023年6月19日周一 19:40写道:
>
> > +1 (binding)
> >
> > Best,
> > Lincoln Lee
> >
> >
> > yuxia  于2023年6月19日周一 19:30写道:
> >
> > > +1 (binding)
> > > Thanks Feng driving it.
> > >
> > > Best regards,
> > > Yuxia
> > >
> > > - 原始邮件 -
> > > 发件人: "Feng Jin" 
> > > 收件人: "dev" 
> > > 发送时间: 星期一, 2023年 6 月 19日 下午 7:22:06
> > > 主题: [VOTE] FLIP-308: Support Time Travel
> > >
> > > Hi everyone
> > >
> > > Thanks for all the feedback about the FLIP-308: Support Time Travel[1].
> > > [2] is the discussion thread.
> > >
> > >
> > > I'd like to start a vote for it. The vote will be open for at least 72
> > > hours(excluding weekends,until June 22, 12:00AM GMT) unless there is an
> > > objection or an insufficient number of votes.
> > >
> > >
> > > [1]
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-308%3A+Support+Time+Travel
> > > [2]https://lists.apache.org/thread/rpozdlf7469jmc7q7vc0s08pjnmscz00
> > >
> > >
> > > Best,
> > > Feng
> > >
> >
>
>
> --
>
> Best,
> Benchao Li
>


Re: [DISCUSS] FLIP-311: Support Call Stored Procedure

2023-06-19 Thread Jing Ge
Thanks for your reply. Nice feature!

Best regards,
Jing

On Wed, Jun 14, 2023 at 3:11 AM yuxia  wrote:

> Yes, you're right.
>
> Best regards,
> Yuxia
>
> ----- 原始邮件 -
> 发件人: "Jing Ge" 
> 收件人: "dev" 
> 发送时间: 星期三, 2023年 6 月 14日 上午 4:46:58
> 主题: Re: [DISCUSS] FLIP-311: Support Call Stored Procedure
>
> Hi yuxia,
>
> Thanks for your proposal and sorry for the late reply. The FLIP is in good
> shape. If I am not mistaken, Everything, that a stored procedure could do,
> could also be done by a Flink job. The current stored procedure design is
> to empower Catalogs to provide users some commonly used logics/functions
> centrally and out-of-the-box, i.e. DRY. Is that correct?
>
> Best regards,
> Jing
>
> On Thu, Jun 8, 2023 at 10:32 AM Jark Wu  wrote:
>
> > Thank you for the proposal, yuxia! The FLIP looks good to me.
> >
> > Best,
> > Jark
> >
> > > 2023年6月8日 11:39,yuxia  写道:
> > >
> > > Hi, all.
> > > Thanks everyone for the valuable input. If there are are no further
> > concerns about this FLIP[1], I would like to start voting next monday
> > (6/12).
> > >
> > > [1]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-311%3A+Support+Call+Stored+Procedure
> > >
> > >
> > > Best regards,
> > > Yuxia
> > >
> > > - 原始邮件 -
> > > 发件人: "Martijn Visser" 
> > > 收件人: "dev" 
> > > 发送时间: 星期二, 2023年 6 月 06日 下午 3:57:56
> > > 主题: Re: [DISCUSS] FLIP-311: Support Call Stored Procedure
> > >
> > > Hi Yuxia,
> > >
> > > Thanks for the clarification. I would be +0 overall, because I think
> > > without actually allowing creation/customization of stored procedures,
> > the
> > > value for the majority of Flink users will be minimal.
> > >
> > > Best regards,
> > >
> > > Martijn
> > >
> > > On Tue, Jun 6, 2023 at 3:52 AM yuxia 
> > wrote:
> > >
> > >> Hi, Martijn.
> > >> Thanks for you feedback.
> > >> 1: In this FLIP we don't intend to allow users to customize their own
> > >> stored procedure for we don't want to expose too much to users too
> > early as
> > >> the FLIP said.
> > >> The procedures are supposed to be provided only by Catalog. Catalog
> devs
> > >> can write their build-in procedures, and return the procedure in
> method
> > >> Catalog.getProcedure(ObjectPath procedurePath);
> > >> So, there won't be SQL syntax to create/save a stored procedure in
> this
> > >> FLIP. If we find we do need it, we can propse the SQL syntax to
> create a
> > >> stored procedure in another dedicated FLIP.
> > >>
> > >> 2: The syntax `Call procedure_name(xx)` proposed in this FLIP is the
> > >> default syntax in Calcite for call stored procedures. Actaully, we
> don't
> > >> need to do any modifcation in flink-sql-parser module for syntax of
> > calling
> > >> a procedure. MySQL[1], Postgres[2], Oracle[3] also use the syntax to
> > call a
> > >> stored procedure.
> > >>
> > >>
> > >> [1] https://dev.mysql.com/doc/refman/8.0/en/call.html
> > >> [2] https://www.postgresql.org/docs/15/sql-call.html
> > >> [3]
> https://docs.oracle.com/javadb/10.8.3.0/ref/rrefcallprocedure.html
> > >>
> > >> Best regards,
> > >> Yuxia
> > >>
> > >> - 原始邮件 -
> > >> 发件人: "Martijn Visser" 
> > >> 收件人: "dev" 
> > >> 发送时间: 星期一, 2023年 6 月 05日 下午 8:35:44
> > >> 主题: Re: [DISCUSS] FLIP-311: Support Call Stored Procedure
> > >>
> > >> Hi Yuxia,
> > >>
> > >> Thanks for the FLIP. I have a couple of questions:
> > >>
> > >> 1. The syntax talks about how to CALL or SHOW the available stored
> > >> procedures, but not on how to create one. Will there not be a SQL
> > syntax to
> > >> create/save a stored procedure?
> > >> 2. Is there a default syntax in Calcite for stored procedures? What do
> > >> other databases do, do they use CALL/SHOW or something like EXEC, USE?
> > >>
> > >> Best regards,
> > >>
> > >> Martijn
> > >>
> > >> On Mon, Jun 5, 2023 at 3:23 AM yuxia 
> > wrote:
> > >>
> > >>> Hi, Jane.
> > >>> Thanks for yo

Re: [DISCUSS] FLIP-324: Introduce Runtime Filter for Flink Batch Jobs

2023-06-19 Thread Jing Ge
Hi Lijie,

Thanks for your proposal. It is a really nice feature. I'd like to ask a
few questions to understand your thoughts.

Afaiu, the runtime Filter will only be Injected when the gap between the
build data size and prob data size is big enough. Let's make an extreme
example. If the small table(build side) has one row and the large
table(probe side) contains tens of billions of rows. This will be the ideal
use case for the runtime filter and the improvement will be significant. Is
this correct?

Speaking of the "Conditions of injecting Runtime Filter" in the FLIP, will
the value of max-build-data-size and min-prob-data-size depend on the
parallelism config? I.e. with the same data-size setting, is it possible to
inject or don't inject runtime filters by adjusting the parallelism?

In the FLIP, there are default values for the new configuration parameters
that will be used to check the injection condition. If ndv cannot be
estimated, row count will be used. Given the max-build-data-size is 10MB
and the min-prob-data-size is 10GB, in the worst case, the min-filter-ratio
will be 0.999, i.e. the probeNdv is 1000 times buildNdv . If we consider
the duplication and the fact that the large table might have more columns
than the small table, the probeNdv should still be 100 or 10 times
buildNdv, which ends up with a min-filter-ratio equals to 0.99 or 0.9. Both
are bigger than the default value 0.5 in the FLIP. If I am not mistaken,
commonly, a min-filter-ratio less than 0.99 will always allow injecting the
runtime filter. Does it make sense to reconsider the formula of ratio
calculation to help users easily control the filter injection?

Best regards,
Jing

On Mon, Jun 19, 2023 at 4:42 PM Lijie Wang  wrote:

> Hi Stefan,
>
> >> bypassing the dataflow
> I believe it's a possible solution, but it may require more coordination
> and extra conditions (such as DFS), I do think it should be excluded from
> the first version. I'll put it in Future+Improvements as a potential
> improvement.
>
> Thanks again for your quick reply :)
>
> Best,
> Lijie
>
> Stefan Richter  于2023年6月19日周一 20:51写道:
>
> >
> > Hi Lijie,
> >
> > I think you understood me correctly. But I would not consider this a true
> > cyclic dependency in the dataflow because I would not suggest to send the
> > filter through an edge in the job graph from join to scan. I’d rather
> > bypass the stream graph for exchanging bringing the filter to the scan.
> For
> > example, the join could report the filter after the build phase, e.g. to
> > the JM or a predefined DFS folder. And when the probe scan is scheduled,
> > the JM provides the filter information to the scan when it gets scheduled
> > for execution or the scan looks in DFS if it can find any filter that it
> > can use as part of initialization. I’m not suggesting to do it exactly in
> > those ways, but just to show what I mean by "bypassing the dataflow".
> >
> > Anyways, I’m fine with excluding this optimization from the current FLIP
> > if you believe it would be hard to implement in Flink.
> >
> > Best,
> > Stefan
> >
> >
> > > On 19. Jun 2023, at 14:07, Lijie Wang 
> wrote:
> > >
> > > Hi Stefan,
> > >
> > > If I understand correctly(I hope so), the hash join operator needs to
> > send
> > > the bloom filter to probe scan, and probe scan also needs to send the
> > > filtered data to the hash join operator. This means there will be a
> cycle
> > > in the data flow, it will be hard for current Flink to schedule this
> kind
> > > of graph. I admit we can find a way to do this, but that's probably a
> > > bit outside the scope of this FLIP.  So let's do these complex
> > > optimizations later, WDYT?
> > >
> > > Best,
> > > Lijie
> > >
> > > Stefan Richter  > srich...@confluent.io.invalid>> 于2023年6月19日周一 18:15写道:
> > >
> > >> Hi Lijie,
> > >>
> > >> Exactly, my proposal was to build the bloom filter in the hash
> > operator. I
> > >> don’t know about all the details about the implementation of Flink’s
> > join
> > >> operator, but I’d assume that even if the join is a two input operator
> > it
> > >> gets scheduled for 2 different pipelines. First the build phase with
> the
> > >> scan from the dimension table and after that’s completed the probe
> phase
> > >> with the scan of the fact table. I’m not proposing the use the bloom
> > filter
> > >> only in the join operator, but rather send the bloom filter to the
> probe
> > >> scan before starting the probe. I assume this would require some form
> of
> > >> side channel to transport the filter and coordination to tell the
> > sources
> > >> that such a filter is available. I cannot answer how hard those would
> > be to
> > >> implement, but the idea doesn’t seem impossible to me.
> > >>
> > >> Best,
> > >> Stefan
> > >>
> > >>
> > >>> On 19. Jun 2023, at 11:56, Lijie Wang 
> > wrote:
> > >>>
> > >>> Hi Stefan,
> > >>>
> > >>> Now I know what you mean about point 1. But currently it is
> unfeasible
> > >> for
> > >>> Flink, because the building of the 

[DISCUSS] Graduate the FileSink to @PublicEvolving

2023-06-20 Thread Jing Ge
Hi all,

The FileSink has been marked as @Experimental[1] since Oct. 2020.
According to FLIP-197[2], I would like to propose to graduate it
to @PublicEvloving in the upcoming 1.18 release.

On the other hand, as a related topic, FileSource was marked
as @PublicEvolving[3] 3 years ago. It deserves a graduation discussion too.
To keep this discussion lean and efficient, let's focus on FlieSink in this
thread. There will be another discussion thread for the FileSource.

I was wondering if anyone might have any concerns. Looking forward to
hearing from you.


Best regards,
Jing






[1]
https://github.com/apache/flink/blob/4006de973525c5284e9bc8fa6196ab7624189261/flink-connectors/flink-connector-files/src/main/java/org/apache/flink/connector/file/sink/FileSink.java#L129
[2]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-197%3A+API+stability+graduation+process
[3]
https://github.com/apache/flink/blob/4006de973525c5284e9bc8fa6196ab7624189261/flink-connectors/flink-connector-files/src/main/java/org/apache/flink/connector/file/src/FileSource.java#L95


Re: [DISCUSS] FLIP-324: Introduce Runtime Filter for Flink Batch Jobs

2023-06-20 Thread Jing Ge
Hi Ron,

Thanks for the clarification. That answered my questions.

Regarding the ratio, since my gut feeling is that any value less than 0.8
or 0.9 won't help too much(I might be wrong). I was thinking of adapting
the formula to somehow map the current 0.9-1 to 0-1, i.e. if user config
0.5, it will be mapped to e.g. 0.95 (or e.g. 0.85, the real number
needs more calculation) for the current formula described in the FLIP. But
I am not sure it is a feasible solution. It deserves more discussion. Maybe
some real performance tests could give us some hints.

Best regards,
Jing

On Tue, Jun 20, 2023 at 5:19 AM liu ron  wrote:

> Hi, Jing
>
> Thanks for your feedback.
>
> > Afaiu, the runtime Filter will only be Injected when the gap between the
> build data size and prob data size is big enough. Let's make an extreme
> example. If the small table(build side) has one row and the large
> table(probe side) contains tens of billions of rows. This will be the ideal
> use case for the runtime filter and the improvement will be significant. Is
> this correct?
>
> Yes, you are right.
>
> > Speaking of the "Conditions of injecting Runtime Filter" in the FLIP,
> will
> the value of max-build-data-size and min-prob-data-size depend on the
> parallelism config? I.e. with the same data-size setting, is it possible to
> inject or don't inject runtime filters by adjusting the parallelism?
>
> First, let me clarify two points. The first is that RuntimeFilter decides
> whether to inject or not in the optimization phase, but we do not consider
> operator parallelism in the SQL optimization phase currently, which is set
> at the ExecNode level. The second is that in batch mode, the default
> AdaptiveBatchScheduler[1] is now used, which will derive the parallelism of
> the downstream operator based on the amount of data produced by the
> upstream operator, that is, the parallelism is determined by runtime
> adaptation. In the above case, we cannot decide whether to inject
> BloomFilter in the optimization stage based on parallelism.
> A more important point is that the purpose of Runtime Filter is to reduce
> the amount of data for shuffle, and thus the amount of data processed by
> the downstream join operator. Therefore, I understand that regardless of
> the parallelism of the probe, the amount of data in the shuffle must be
> reduced after inserting the Runtime Filter, which is beneficial to the join
> operator, so whether to insert the RuntimeFilter or not is not dependent on
> the parallelism.
>
> > Does it make sense to reconsider the formula of ratio
> calculation to help users easily control the filter injection?
>
> Only when ndv does not exist will row count be considered. when size uses
> the default value and ndv cannot be taken, it is true that this condition
> may always hold, but this does not seem to affect anything, and the user is
> also likely to change the value of the size. One question, how do you think
> we should make it easier for users to control the  filter injection?
>
>
> [1]:
>
> https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/elastic_scaling/#adaptive-batch-scheduler
>
> Best,
> Ron
>
> Jing Ge  于2023年6月20日周二 07:11写道:
>
> > Hi Lijie,
> >
> > Thanks for your proposal. It is a really nice feature. I'd like to ask a
> > few questions to understand your thoughts.
> >
> > Afaiu, the runtime Filter will only be Injected when the gap between the
> > build data size and prob data size is big enough. Let's make an extreme
> > example. If the small table(build side) has one row and the large
> > table(probe side) contains tens of billions of rows. This will be the
> ideal
> > use case for the runtime filter and the improvement will be significant.
> Is
> > this correct?
> >
> > Speaking of the "Conditions of injecting Runtime Filter" in the FLIP,
> will
> > the value of max-build-data-size and min-prob-data-size depend on the
> > parallelism config? I.e. with the same data-size setting, is it possible
> to
> > inject or don't inject runtime filters by adjusting the parallelism?
> >
> > In the FLIP, there are default values for the new configuration
> parameters
> > that will be used to check the injection condition. If ndv cannot be
> > estimated, row count will be used. Given the max-build-data-size is 10MB
> > and the min-prob-data-size is 10GB, in the worst case, the
> min-filter-ratio
> > will be 0.999, i.e. the probeNdv is 1000 times buildNdv . If we consider
> > the duplication and the fact that the large table might have more columns
> > than the small table, the probeNdv should still be 100 or 10 

Re: [DISCUSS] FLIP-321: Introduce an API deprecation process

2023-06-20 Thread Jing Ge
> >>>>>> There're various ways to do that, e.g., release notes, warnings in
> >>>> logs,
> >>>>>> etc.
> >>>>>>
> >>>>>> Another possible alternative: whenever there's a deprecated Public
> >> API
> >>>>> that
> >>>>>> reaches a major version bump before the migration period, and we
> >> also
> >>>>> don't
> >>>>>> want to carry it for all the next major release series, we may
> >> consider
> >>>>>> releasing more minor releases for the previous major version after
> >> the
> >>>>>> bump. E.g., an Public API is deprecated in 1.19, and then we bump
> >> to
> >>>> 2.0,
> >>>>>> we can release one more 1.20 after 2.0. That should provide users
> >>>> another
> >>>>>> choice rather than upgrading to 2.0, while satisfying the
> >>>> 2-minor-release
> >>>>>> migration period.
> >>>>>>
> >>>>>> I think my major point is, we should not carry APIs deprecated in a
> >>>>>> previous major version along all the next major version series. I'd
> >>>> like
> >>>>> to
> >>>>>> try giving users more commitments, i.e. the migration period, as
> >> long
> >>>> as
> >>>>> it
> >>>>>> does not prevent us from making breaking changes. If it doesn't
> >> work,
> >>>> I'd
> >>>>>> be in favor of not providing the migration period, but fallback to
> >> only
> >>>>>> guarantee the compatibility within the major version.
> >>>>>>
> >>>>>> Best,
> >>>>>>
> >>>>>> Xintong
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Mon, Jun 19, 2023 at 10:48 AM John Roesler  <mailto:vvcep...@apache.org>
> >>>
> >>>>> wrote:
> >>>>>>
> >>>>>>> Hi Becket,
> >>>>>>>
> >>>>>>> Thanks for the reply! I’d like to continue the conversation about
> >>>>>>> compatibility outside of this FLIP thread, but for now, I can
> >> accept
> >>>>> your
> >>>>>>> decision. It’s certainly an improvement.
> >>>>>>>
> >>>>>>> Thanks again,
> >>>>>>> John
> >>>>>>>
> >>>>>>> On Sun, Jun 18, 2023, at 21:42, Becket Qin wrote:
> >>>>>>>> Hi John,
> >>>>>>>>
> >>>>>>>> Completely agree with all you said.
> >>>>>>>>
> >>>>>>>> Can we consider only dropping deprecated APIs in major releases
> >>>>> across
> >>>>>>> the
> >>>>>>>>> board? I understand that Experimental and PublicEvolving APIs
> >> are
> >>>> by
> >>>>>>>>> definition less stable, but it seems like this should be
> >> reflected
> >>>>> in
> >>>>>>> the
> >>>>>>>>> required deprecation period alone. I.e. that we must keep them
> >>>>> around
> >>>>>>> for
> >>>>>>>>> at least zero or one minor release, not that we can drop them
> >> in a
> >>>>>>> minor or
> >>>>>>>>> patch release.
> >>>>>>>>
> >>>>>>>> Personally speaking, I would love to do this, for exactly the
> >>>> reason
> >>>>>> you
> >>>>>>>> mentioned. However, I did not propose this due to the following
> >>>>>> reasons:
> >>>>>>>>
> >>>>>>>> 1. I am hesitating a little bit about changing the accepted
> >> FLIPs
> >>>> too
> >>>>>>> soon.
> >>>>>>>> 2. More importantly, to avoid slowing down our development. At
> >> this
> >>>>>>> point,
> >>>>>>>> Flink still lacks some design / routines to support good API
> >>>>>>> evolvability /

Re: [DISCUSS] FLIP-324: Introduce Runtime Filter for Flink Batch Jobs

2023-06-21 Thread Jing Ge
Hi Ron,

Thanks for sharing your thoughts! It makes sense. It would be helpful if
these references of Hive, Polardb, etc. could be added into the FLIP.

Best regards,
Jing

On Tue, Jun 20, 2023 at 5:41 PM liu ron  wrote:

> Hi, Jing
>
> The default value for this ratio is a reference to other systems, such as
> Hive. As long as Runtime Filter can filter out more than half of the data,
> we can benefit from it. Of course, normally, as long as we can get the
> statistics, ndv are present, the use of rowCount should be less, so I think
> the formula is valid in most cases. This formula we are also borrowed from
> some systems, such as the polardb of AliCloud. your concern is valuable for
> this FLIP, but currently, we do not know how to adjust is reasonably, too
> complex may lead to the user also can not understand, so I think we should
> first follow the simple way, the subsequent gradual optimization. The first
> step may be that we can verify the reasonableness of current formula by
> TPC-DS case.
>
> Best,
> Ron
>
> Jing Ge  于2023年6月20日周二 19:46写道:
>
> > Hi Ron,
> >
> > Thanks for the clarification. That answered my questions.
> >
> > Regarding the ratio, since my gut feeling is that any value less than 0.8
> > or 0.9 won't help too much(I might be wrong). I was thinking of adapting
> > the formula to somehow map the current 0.9-1 to 0-1, i.e. if user config
> > 0.5, it will be mapped to e.g. 0.95 (or e.g. 0.85, the real number
> > needs more calculation) for the current formula described in the FLIP.
> But
> > I am not sure it is a feasible solution. It deserves more discussion.
> Maybe
> > some real performance tests could give us some hints.
> >
> > Best regards,
> > Jing
> >
> > On Tue, Jun 20, 2023 at 5:19 AM liu ron  wrote:
> >
> > > Hi, Jing
> > >
> > > Thanks for your feedback.
> > >
> > > > Afaiu, the runtime Filter will only be Injected when the gap between
> > the
> > > build data size and prob data size is big enough. Let's make an extreme
> > > example. If the small table(build side) has one row and the large
> > > table(probe side) contains tens of billions of rows. This will be the
> > ideal
> > > use case for the runtime filter and the improvement will be
> significant.
> > Is
> > > this correct?
> > >
> > > Yes, you are right.
> > >
> > > > Speaking of the "Conditions of injecting Runtime Filter" in the FLIP,
> > > will
> > > the value of max-build-data-size and min-prob-data-size depend on the
> > > parallelism config? I.e. with the same data-size setting, is it
> possible
> > to
> > > inject or don't inject runtime filters by adjusting the parallelism?
> > >
> > > First, let me clarify two points. The first is that RuntimeFilter
> decides
> > > whether to inject or not in the optimization phase, but we do not
> > consider
> > > operator parallelism in the SQL optimization phase currently, which is
> > set
> > > at the ExecNode level. The second is that in batch mode, the default
> > > AdaptiveBatchScheduler[1] is now used, which will derive the
> parallelism
> > of
> > > the downstream operator based on the amount of data produced by the
> > > upstream operator, that is, the parallelism is determined by runtime
> > > adaptation. In the above case, we cannot decide whether to inject
> > > BloomFilter in the optimization stage based on parallelism.
> > > A more important point is that the purpose of Runtime Filter is to
> reduce
> > > the amount of data for shuffle, and thus the amount of data processed
> by
> > > the downstream join operator. Therefore, I understand that regardless
> of
> > > the parallelism of the probe, the amount of data in the shuffle must be
> > > reduced after inserting the Runtime Filter, which is beneficial to the
> > join
> > > operator, so whether to insert the RuntimeFilter or not is not
> dependent
> > on
> > > the parallelism.
> > >
> > > > Does it make sense to reconsider the formula of ratio
> > > calculation to help users easily control the filter injection?
> > >
> > > Only when ndv does not exist will row count be considered. when size
> uses
> > > the default value and ndv cannot be taken, it is true that this
> condition
> > > may always hold, but this does not seem to affect anything, and the
> user
> > is
> > > also likely to change the value of the size. One question, how do yo

Re: [DISCUSS] Graduate the FileSink to @PublicEvolving

2023-06-22 Thread Jing Ge
Hi Galen,

Thanks for the hint which is helpful for us to have a clear big picture.
Afaiac, this will not be a blocking issue for the graduation. There will
always be some (potential) bugs in the implementation. The API is very
stable from 2020. The timing is good to graduate. WDYT?
Furthermore, I'd like to have more opinions. All opinions together will
help the community build a mature API graduation process.

Best regards,
Jing

On Tue, Jun 20, 2023 at 12:48 PM Galen Warren
 wrote:

> Is this issue still unresolved?
>
> https://issues.apache.org/jira/plugins/servlet/mobile#issue/FLINK-30238
>
> Based on prior discussion, I believe this could lead to data loss with
> FileSink.
>
>
>
> On Tue, Jun 20, 2023, 5:41 AM Jing Ge  wrote:
>
> > Hi all,
> >
> > The FileSink has been marked as @Experimental[1] since Oct. 2020.
> > According to FLIP-197[2], I would like to propose to graduate it
> > to @PublicEvloving in the upcoming 1.18 release.
> >
> > On the other hand, as a related topic, FileSource was marked
> > as @PublicEvolving[3] 3 years ago. It deserves a graduation discussion
> too.
> > To keep this discussion lean and efficient, let's focus on FlieSink in
> this
> > thread. There will be another discussion thread for the FileSource.
> >
> > I was wondering if anyone might have any concerns. Looking forward to
> > hearing from you.
> >
> >
> > Best regards,
> > Jing
> >
> >
> >
> >
> >
> >
> > [1]
> >
> >
> https://github.com/apache/flink/blob/4006de973525c5284e9bc8fa6196ab7624189261/flink-connectors/flink-connector-files/src/main/java/org/apache/flink/connector/file/sink/FileSink.java#L129
> > [2]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-197%3A+API+stability+graduation+process
> > [3]
> >
> >
> https://github.com/apache/flink/blob/4006de973525c5284e9bc8fa6196ab7624189261/flink-connectors/flink-connector-files/src/main/java/org/apache/flink/connector/file/src/FileSource.java#L95
> >
>


Re: [VOTE] FLIP-324: Introduce Runtime Filter for Flink Batch Jobs

2023-06-23 Thread Jing Ge
+1(binding)

Best Regards,
Jing

On Fri, Jun 23, 2023 at 5:50 PM Lijie Wang  wrote:

> Hi all,
>
> Thanks for all the feedback about the FLIP-324: Introduce Runtime Filter
> for Flink Batch Jobs[1]. This FLIP was discussed in [2].
>
> I'd like to start a vote for it. The vote will be open for at least 72
> hours (until June 29th 12:00 GMT) unless there is an objection or
> insufficient votes.
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-324%3A+Introduce+Runtime+Filter+for+Flink+Batch+Jobs
> [2] https://lists.apache.org/thread/mm0o8fv7x7k13z11htt88zhy7lo8npmg
>
> Best,
> Lijie
>


Re: [DISCUSS] Graduate the FileSink to @PublicEvolving

2023-06-26 Thread Jing Ge
Hi,

@Galen @Yuxia

Your points are valid. Speaking of removing deprecated API, I have the same
concern. As a matter of fact, I have been raising it in the discussion
thread of API deprecation process[1]. This is another example that we
should care about more factors than the migration period, thanks for
the hint! I will add one more update into that thread with the reference of
this thread.

In a nutshell, this thread is focusing on the graduation process. Your
valid concerns should be taken care of by the deprecation process.
Please don't hesitate to share your thoughts in that thread.


Best regards,
Jing

[1] https://lists.apache.org/thread/vmhzv8fcw2b33pqxp43486owrxbkd5x9


On Sun, Jun 25, 2023 at 3:48 AM yuxia  wrote:

> Thanks Jing for briging this to dicuss.
> I agree it's not a blocker for graduting the FileSink to @PublicEvolving
> since the Sink which is the rootcause has marked as @PublicEvolving.
> But I do also share the same concern with Galen. At least it should be a
> blocker for removing StreamingFileSink.
> Btw, seems it's really a big headache for migrating to Sink, we may need
> to pay more attention to this ticket and try to fix it.
>
> Best regards,
> Yuxia
>
> - 原始邮件 -
> 发件人: "Galen Warren" 
> 收件人: "dev" 
> 发送时间: 星期五, 2023年 6 月 23日 下午 7:47:24
> 主题: Re: [DISCUSS] Graduate the FileSink to @PublicEvolving
>
> Thanks Jing. I can only offer my perspective on this, others may view it
> differently.
>
> If FileSink is subject to data loss in the "stop-on-savepoint then restart"
> scenario, that makes it unusable for me, and presumably for anyone who uses
> it in a long-running streaming application and who cannot tolerate data
> loss. I still use the (deprecated!) StreamingFileSink for this reason.
>
> The bigger picture here is that StreamingFileSink is deprecated and will
> presumably ultimately be removed, to be replaced with FileSink. Graduating
> the status of FileSink seems to be a step along that path; I'm concerned
> about continuing down that path with such a critical issue present.
> Ultimately, my concern is that FileSink will graduate fully and that
> StreamingFileSink will be removed and that there will be no remaining
> option to reliably stop/start streaming jobs that write to files without
> incurring the risk of data loss.
>
> I'm sure I'd feel better about things if there were an ongoing effort to
> address this FileSink issue and/or a commitment that StreamingFileSink
> would not be removed until this issue is addressed.
>
> My two cents -- thanks.
>
>
> On Fri, Jun 23, 2023 at 1:47 AM Jing Ge 
> wrote:
>
> > Hi Galen,
> >
> > Thanks for the hint which is helpful for us to have a clear big picture.
> > Afaiac, this will not be a blocking issue for the graduation. There will
> > always be some (potential) bugs in the implementation. The API is very
> > stable from 2020. The timing is good to graduate. WDYT?
> > Furthermore, I'd like to have more opinions. All opinions together will
> > help the community build a mature API graduation process.
> >
> > Best regards,
> > Jing
> >
> > On Tue, Jun 20, 2023 at 12:48 PM Galen Warren
> >  wrote:
> >
> > > Is this issue still unresolved?
> > >
> > >
> https://issues.apache.org/jira/plugins/servlet/mobile#issue/FLINK-30238
> > >
> > > Based on prior discussion, I believe this could lead to data loss with
> > > FileSink.
> > >
> > >
> > >
> > > On Tue, Jun 20, 2023, 5:41 AM Jing Ge 
> > wrote:
> > >
> > > > Hi all,
> > > >
> > > > The FileSink has been marked as @Experimental[1] since Oct. 2020.
> > > > According to FLIP-197[2], I would like to propose to graduate it
> > > > to @PublicEvloving in the upcoming 1.18 release.
> > > >
> > > > On the other hand, as a related topic, FileSource was marked
> > > > as @PublicEvolving[3] 3 years ago. It deserves a graduation
> discussion
> > > too.
> > > > To keep this discussion lean and efficient, let's focus on FlieSink
> in
> > > this
> > > > thread. There will be another discussion thread for the FileSource.
> > > >
> > > > I was wondering if anyone might have any concerns. Looking forward to
> > > > hearing from you.
> > > >
> > > >
> > > > Best regards,
> > > > Jing
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > [1]
> > > >
> > > >
> > >
> >
> https://github.com/apache/flink/blob/4006de973525c5284e9bc8fa6196ab7624189261/flink-connectors/flink-connector-files/src/main/java/org/apache/flink/connector/file/sink/FileSink.java#L129
> > > > [2]
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-197%3A+API+stability+graduation+process
> > > > [3]
> > > >
> > > >
> > >
> >
> https://github.com/apache/flink/blob/4006de973525c5284e9bc8fa6196ab7624189261/flink-connectors/flink-connector-files/src/main/java/org/apache/flink/connector/file/src/FileSource.java#L95
> > > >
> > >
> >
>


Re: [DISCUSS] FLIP-321: Introduce an API deprecation process

2023-06-26 Thread Jing Ge
t; > > is reached.
> > >
> > > Sorry, I didn't read the previous detailed discussion because the
> > > discussion list was so long.
> > >
> > > I don't really like either of these options.
> > >
> > > Considering that DataStream is such an important API, can we offer a
> > third
> > > option:
> > >
> > > 3. Maintain the DataStream API throughout 2.X and remove it until 3.x.
> > But
> > > there's no need to assume that 2.X is a short version, it's still a
> > normal
> > > major version.
> > >
> > > Best,
> > > Jingsong
> > >
> > > Becket Qin 于2023年6月22日 周四16:02写道:
> > >
> > > > Thanks much for the input, John, Stefan and Jing.
> > > >
> > > > I think Xingtong has well summarized the pros and cons of the two
> > > options.
> > > > Let's collect a few more opinions here and we can move forward with
> the
> > > one
> > > > more people prefer.
> > > >
> > > > Thanks,
> > > >
> > > > Jiangjie (Becket) Qin
> > > >
> > > > On Wed, Jun 21, 2023 at 3:20 AM Jing Ge 
> > > > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > Thanks Xingtong for the summary. If I could only choose one of the
> > > given
> > > > > two options, I would go with option 1. I understood that option 2
> > > worked
> > > > > great with Kafka. But the bridge release will still confuse users
> and
> > > my
> > > > > gut feeling is that many users will skip 2.0 and be waiting for 3.0
> > or
> > > > even
> > > > > 3.x. And since fewer users will use Flink 2.x, the development
> focus
> > > will
> > > > > be on Flink 3.0 with the fact that the current Flink release is
> 1.17
> > > and
> > > > we
> > > > > are preparing 2.0 release. That is weird for me.
> > > > >
> > > > > THB, I would not name the change from @Public to @Retired as a
> > > demotion.
> > > > > The purpose of @Retire is to extend the API lifecycle with one more
> > > > stage,
> > > > > like in the real world, people born, studied, graduated, worked,
> and
> > > > > retired. Afaiu from the previous discussion, there are two rules
> we'd
> > > > like
> > > > > to follow simultaneously:
> > > > >
> > > > > 1. Public APIs can only be changed between major releases.
> > > > > 2. A smooth migration phase should be offered to users, i.e. at
> > least 2
> > > > > minor releases after APIs are marked as @deprecated. There should
> be
> > > new
> > > > > APIs as the replacement.
> > > > >
> > > > > Agree, those rules are good to improve the user friendliness.
> Issues
> > we
> > > > > discussed are rising because we want to fulfill both of them. If we
> > > take
> > > > > care of deprecation very seriously, APIs can be marked as
> > @Deprecated,
> > > > only
> > > > > when the new APIs as the replacement provide all functionalities
> the
> > > > > deprecated APIs have. In an ideal case without critical bugs that
> > might
> > > > > stop users adopting the new APIs. Otherwise the expected
> > "replacement"
> > > > will
> > > > > not happen. Users will still stick to the deprecated APIs, because
> > the
> > > > new
> > > > > APIs can not be used. For big features, it will need at least 4
> minor
> > > > > releases(ideal case), i.e. 2+ years to remove deprecated APIs:
> > > > >
> > > > > - 1st minor release to build the new APIs as the replacement and
> > > waiting
> > > > > for feedback. It might be difficult to mark the old API as
> deprecated
> > > in
> > > > > this release, because we are not sure if the new APIs could cover
> > 100%
> > > > > functionalities.
> > > > > -  In the lucky case,  mark all old APIs as deprecated in the 2nd
> > minor
> > > > > release. (I would even suggest having the new APIs released at
> least
> > > for
> > > > > two minor releases before marking it as deprecated to make sure
> they
> > > can
> > > > > really repla

Re: [DISCUSS] FLIP-316: Introduce SQL Driver

2023-06-26 Thread Jing Ge
Hi Paul,

Thanks for driving it and thank you all for the informative discussion! The
FLIP is in good shape now. As described in the FLIP, SQL Driver will be
mainly used to run Flink SQLs in two scenarios: 1. SQL client/gateway in
application mode and 2. external system integration. Would you like to add
one section to describe(better with script/code example) how to use it in
these two scenarios from users' perspective?

NIT: the pictures have transparent background when readers click on it. It
would be great if you can replace them with pictures with white background.

Best regards,
Jing

On Mon, Jun 26, 2023 at 1:31 PM Paul Lam  wrote:

> Hi Shengkai,
>
> > * How can we ship the json plan to the JobManager?
>
> The Flink K8s module should be responsible for file distribution. We could
> introduce
> an option like `kubernetes.storage.dir`. For each flink cluster, there
> would be a
> dedicated subdirectory, with the pattern like
> `${kubernetes.storage.dir}/${cluster-id}`.
>
> All resources-related options (e.g. pipeline jars, json plans) that are
> configured with
> scheme `file://`  would be uploaded to the resource directory
> and downloaded to the
> jobmanager, before SQL Driver accesses the files with the original
> filenames.
>
>
> > * Classloading strategy
>
>
> We could directly specify the SQL Gateway jar as the jar file in
> PackagedProgram.
> It would be treated like a normal user jar and the SQL Driver is loaded
> into the user
> classloader. WDYT?
>
> > * Option `$internal.sql-gateway.driver.sql-config` is string type
> > I think it's better to use Map type here
>
> By Map type configuration, do you mean a nested map that contains all
> configurations?
>
> I hope I've explained myself well, it’s a file that contains the extra SQL
> configurations, which would be shipped to the jobmanager.
>
> > * PoC branch
>
> Sure. I’ll let you know once I get the job done.
>
> Best,
> Paul Lam
>
> > 2023年6月26日 14:27,Shengkai Fang  写道:
> >
> > Hi, Paul.
> >
> > Thanks for your update. I have a few questions about the new design:
> >
> > * How can we ship the json plan to the JobManager?
> >
> > The current design only exposes an option about the URL of the json
> plan. It seems the gateway is responsible to upload to an external stroage.
> Can we reuse the PipelineOptions.JARS to ship to the remote filesystem?
> >
> > * Classloading strategy
> >
> > Currently, the Driver is in the sql-gateway package. It means the Driver
> is not in the JM's classpath directly. Because the sql-gateway jar is now
> in the opt directory rather than lib directory. It may need to add the
> external dependencies as Python does[1]. BTW, I think it's better to move
> the Driver into the flink-table-runtime package, which is much easier to
> find(Sorry for the wrong opinion before).
> >
> > * Option `$internal.sql-gateway.driver.sql-config` is string type
> >
> > I think it's better to use Map type here
> >
> > * PoC branch
> >
> > Because this FLIP involves many modules, do you have a PoC branch to
> verify it does work?
> >
> > Best,
> > Shengkai
> >
> > [1]
> https://github.com/apache/flink/blob/master/flink-yarn/src/main/java/org/apache/flink/yarn/YarnClusterDescriptor.java#L940
> <
> https://github.com/apache/flink/blob/master/flink-yarn/src/main/java/org/apache/flink/yarn/YarnClusterDescriptor.java#L940
> >
> > Paul Lam mailto:paullin3...@gmail.com>>
> 于2023年6月19日周一 14:09写道:
> > Hi Shengkai,
> >
> > Sorry for my late reply. It took me some time to update the FLIP.
> >
> > In the latest FLIP design, SQL Driver is placed in flink-sql-gateway
> module. PTAL.
> >
> > The FLIP does not cover details about the K8s file distribution, but its
> general usage would
> > be very much the same as YARN setups. We could make follow-up
> discussions in the jira
> > tickets.
> >
> > Best,
> > Paul Lam
> >
> >> 2023年6月12日 15:29,Shengkai Fang  fskm...@gmail.com>> 写道:
> >>
> >>
> >> > If it’s the case, I’m good with introducing a new module and making
> SQL Driver
> >> > an internal class and accepts JSON plans only.
> >>
> >> I rethink this again and again. I think it's better to move the
> SqlDriver into the sql-gateway module because the sql client relies on the
> sql-gateway to submit the sql and the sql-gateway has the ability to
> generate the ExecNodeGraph now. +1 to support accepting JSON plans only.
> >>
> >> * Upload configuration through command line parameter
> >>
> >> ExecNodeGraph only contains the job's information but it doesn't
> contain the checkpoint dir, checkpoint interval, execution mode and so on.
> So I think we should also upload the configuration.
> >>
> >> * KubernetesClusterDescripter and
> KubernetesApplicationClusterEntrypoint are responsible for the jar
> upload/download
> >>
> >> +1 for the change.
> >>
> >> Could you update the FLIP about the current discussion?
> >>
> >> Best,
> >> Shengkai
> >>
> >>
> >>
> >>
> >>
> >>
> >> Yang Wang mailto:wangyang0...@apache.org>>
> 于2023年6月12日周一 11:41写道:
> >> Sorry for the

Re: [DISCUSS] FLIP-303: Support REPLACE TABLE AS SELECT statement

2023-06-27 Thread Jing Ge
Hi Yuxia,

Thanks for the proposal. Many engines like Snowflake, Databricks support
it. +1

"3:Check the atomicity is enabled, it requires both the options
table.rtas-ctas.atomicity-enabled is set to true and the corresponding
table sink implementation SupportsStaging."

Typo? "Option" instead of "options"? It sounds like there are more options
that need to be set.

Best regards,
jing




On Tue, Jun 27, 2023 at 8:37 AM yuxia  wrote:

> Hi, all.
> Thanks for the feedback.
>
> If there are no other questions or concerns for the FLIP[1], I'd like to
> start the vote tomorrow (6.28).
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-303%3A+Support+REPLACE+TABLE+AS+SELECT+statement
>
> Best regards,
> Yuxia
>
>
> 发件人: "zhangmang1" 
> 收件人: "dev" , luoyu...@alumni.sjtu.edu.cn
> 发送时间: 星期二, 2023年 6 月 27日 下午 12:03:35
> 主题: Re:Re: [DISCUSS] FLIP-303: Support REPLACE TABLE AS SELECT statement
>
> Hi yuxia,
>
> +1 for this new feature.
> In particular, the CREATE OR REPLACE TABLE syntax is more usable and
> faster for users.
>
>
>
>
>
> --
> Best regards,
> Mang Zhang
>
>
>
>
> At 2023-06-26 09:46:40, "yuxia"  wrote:
> >Hi, folks.
> >To save the time of reviewers, I would like to summary the main changes
> of this FLIP[1]. The FLIP is just to introduce REPLACE TABLE AS SELECT
> statement which is almost similar to CREATE TABLE AS SELECT statement, and
> a syntax CREATE OR REPLACE TABLE AS to wrap both. This FLIP is try to
> complete such kinds of statement.
> >
> >The changes are as follows:
> >1: Add enum REPLACE_TABLE_AS, CREATE_OR_REPLACE_TABLE_AS in
> StagingPurpose which is proposed in FLIP-305[2].
> >
> >2: Change the configuration from `table.ctas.atomicity-enabled` proposed
> in FLIP-305[2] to `table.rtas-ctas.atomicity-enabled` to make it take
> effect not only for create table as, but for replace table as && create or
> replace table as. The main reason is that these statements are almost same
> which belongs to same statement family and I would not like to introduce a
> new different configuration which actually do the same thing. Also, IIRC,
> in the offline dicussion about FLIP-218[1], it also wants to introduce
> `table.rtas-ctas.atomicity-enabled`, but as FLIP-218 is only to support
> CTAS, it's not suitable to introduce a configuration implying rtas which is
> not supported. So, we change the configuration to
> `table.ctas.atomicity-enabled`. Since CTAS has been supported, I think it's
> reasonable to revist it and introduce `table.rtas-ctas.atomicity-enabled` a
> to unify them in this FLIP for supporting REPLACE TABLE AS statement.
> >
> >
> >Again, look forward to your feedback.
> >
> >[1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-303%3A+Support+REPLACE+TABLE+AS+SELECT+statement
> >[2]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-305%3A+Support+atomic+for+CREATE+TABLE+AS+SELECT%28CTAS%29+statement
> >[3]
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=199541185
> >
> >Best regards,
> >Yuxia
> >
> >- 原始邮件 -
> >发件人: "yuxia" 
> >收件人: "dev" 
> >发送时间: 星期四, 2023年 6 月 15日 下午 7:58:27
> >主题: [DISCUSS] FLIP-303: Support REPLACE TABLE AS SELECT statement
> >
> >Hi, devs.
> >As the FLIPs FLIP-218[1] & FLIP-305[2] for Flink to supports CREATE TABLE
> AS SELECT statement has been accepted.
> >I would like to start a discussion about FLIP-303: Support REPLACE TABLE
> AS SELECT+statement[3] to complete such kinds of statements.
> >With REPLACE TABLE AS SELECT statement, users won't need to drop the
> table firstly, and use CREATE TABLE AS SELECT then. Since the statement is
> much similar to CREATE TABLE AS statement, the design is much similar to
> FLIP-218[1] & FLIP-305[2] apart from some parts specified to REPLACE TABLE
> AS SELECT statement.
> >Just kindly remind, to understand this FLIP better, you may need read
> FLIP-218[1] & FLIP-305[2] to get more context.
> >
> >Look forward to your feedback.
> >
> >[1]:
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=199541185
> >[2]:
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-305%3A+Support+atomic+for+CREATE+TABLE+AS+SELECT%28CTAS%29+statement
> >[3]:
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-303%3A+Support+REPLACE+TABLE+AS+SELECT+statement
> >
> >:) just notice I miss "[DISCUSS]" in the title of the previous email [4],
> so I send it again here with the correct email title. Please ignore the
> previous email and discuss in this thread.
> >Sorry for the noise.
> >
> >[4]: https://lists.apache.org/thread/jy39xwxn1o2035y5411xynwtbyfgg76t
> >
> >
> >Best regards,
> >Yuxia
>
>


Re: [VOTE] FLIP-303: Support REPLACE TABLE AS SELECT statement

2023-06-28 Thread Jing Ge
+1(binding)

On Wed, Jun 28, 2023 at 1:51 PM Mang Zhang  wrote:

> +1 (no-binding)
>
>
> --
>
> Best regards,
> Mang Zhang
>
>
>
>
>
> At 2023-06-28 17:48:15, "yuxia"  wrote:
> >Hi everyone,
> >Thanks for all the feedback about FLIP-303: Support REPLACE TABLE AS
> SELECT statement[1]. Based on the discussion [2], we have come to a
> consensus, so I would like to start a vote.
> >The vote will be open for at least 72 hours (until July 3th, 10:00AM GMT)
> unless there is an objection or an insufficient number of votes.
> >
> >
> >[1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-303%3A+Support+REPLACE+TABLE+AS+SELECT+statement
> >[2] https://lists.apache.org/thread/39mwckdsdgck48tzsdfm66hhnxorjtz3
> >
> >
> >Best regards,
> >Yuxia
>


Re: [DISCUSS] FLIP-314: Support Customized Job Lineage Listener

2023-06-28 Thread Jing Ge
Hi Shammon,

Thanks for your proposal. After reading the FLIP, I'd like to ask
some questions to make sure we are on the same page. Thanks!

1. TableColumnLineageRelation#sinkColumn() should return
TableColumnLineageEntity instead of String, right?

2. Since LineageRelation already contains all information to build the
lineage between sources and sink, do we still need to set the LineageEntity
in the source?

3. About the "Entity" and "Relation" naming, I was confused too, like
Qingsheng mentioned. How about LineageVertex, LineageEdge, and LineageEdges
which contains multiple LineageEdge? E.g. multiple sources join into one
sink, or, edges of columns from one or different tables, etc.

Best regards,
Jing

On Sun, Jun 25, 2023 at 2:06 PM Shammon FY  wrote:

> Hi yuxia and Yun,
>
> Thanks for your input.
>
> For yuxia:
> > 1: What kinds of JobStatus will the `JobExecutionStatusEven` including?
>
> At present, we only need to notify the listener when a job goes to
> termination, but I think it makes sense to add generic `oldStatus` and
> `newStatus` in the listener and users can update the job state in their
> service as needed.
>
> > 2: I'm really confused about the `config()` included in `LineageEntity`,
> where is it from and what is it for ?
>
> The `config` in `LineageEntity` is used for users to get options for source
> and sink connectors. As the examples in the FLIP, users can add
> server/group/topic information in the config for kafka and create lineage
> entities for `DataStream` jobs, then the listeners can get this information
> to identify the same connector in different jobs. Otherwise, the `config`
> in `TableLineageEntity` will be the same as `getOptions` in
> `CatalogBaseTable`.
>
> > 3: Regardless whether `inputChangelogMode` in `TableSinkLineageEntity` is
> needed or not, since `TableSinkLineageEntity` contains
> `inputChangelogMode`, why `TableSourceLineageEntity` don't contain
> changelogmode?
>
> At present, we do not actually use the changelog mode. It can be deleted,
> and I have updated FLIP.
>
> > Btw, since there're a lot interfaces proposed, I think it'll be better to
> give an example about how to implement a listener in this FLIP to make us
> know better about the interfaces.
>
> I have added the example in the FLIP and the related interfaces and
> examples are in branch [1].
>
> For Yun:
> > I have one more question on the lookup-join dim tables, it seems this
> FLIP does not touch them, and will them become part of the
> List sources() or adding another interface?
>
> You're right, currently lookup join dim tables were not considered in the
> 'proposed changed' section of this FLIP. But the interface for lineage is
> universal and we can give `TableLookupSourceLineageEntity` which implements
> `TableSourceLineageEntity` in the future without modifying the public
> interface.
>
> > By the way, if you want to focus on job lineage instead of data column
> lineage in this FLIP, why we must introduce so many column-lineage related
> interface here?
>
> The lineage information in SQL jobs includes table lineage and column
> lineage. Although SQL jobs currently do not support column lineage, we
> would like to support this in the next step. So we have comprehensively
> considered the table lineage and column lineage interfaces here, and
> defined these two interfaces together clearly
>
>
> [1]
>
> https://github.com/FangYongs/flink/commit/d4bfe57e7a5315b790e79b8acef8b11e82c9187c
>
> Best,
> Shammon FY
>
>
> On Sun, Jun 25, 2023 at 4:17 PM Yun Tang  wrote:
>
> > Hi Shammon,
> >
> > I like the idea in general and it will help to analysis the job lineages
> > no matter FlinkSQL or Flink jar jobs in production environments.
> >
> > For Qingsheng's concern, I'd like the name of JobType more than
> > RuntimeExecutionMode, as the latter one is not easy to understand for
> users.
> >
> > I have one more question on the lookup-join dim tables, it seems this
> FLIP
> > does not touch them, and will them become part of the List
> > sources()​ or adding another interface?
> >
> > By the way, if you want to focus on job lineage instead of data column
> > lineage in this FLIP, why we must introduce so many column-lineage
> related
> > interface here?
> >
> >
> > Best
> > Yun Tang
> > 
> > From: Shammon FY 
> > Sent: Sunday, June 25, 2023 16:13
> > To: dev@flink.apache.org 
> > Subject: Re: [DISCUSS] FLIP-314: Support Customized Job Lineage Listener
> >
> > Hi Qingsheng,
> >
> > Thanks for your valuable feedback.
> >
> > > 1. Is there any specific use case to expose the batch / streaming info
> to
> > listeners or meta services?
> >
> > I agree with you that Flink is evolving towards batch-streaming
> > unification, but the lifecycle of them is different. If a job processes a
> > bound dataset, it will end after completing the data processing,
> otherwise,
> > it will run for a long time. In our scenario, we will regularly schedule
> > some Flink jobs to process boun

  1   2   3   4   5   6   7   8   >