Re: Re: Re: [VOTE] Accept Flink CDC into Apache Flink

2024-01-11 Thread godfrey he
+1 (binding)

Thanks,
Godfrey

Zhu Zhu  于2024年1月12日周五 14:10写道:
>
> +1 (binding)
>
> Thanks,
> Zhu
>
> Hangxiang Yu  于2024年1月11日周四 14:26写道:
>
> > +1 (non-binding)
> >
> > On Thu, Jan 11, 2024 at 11:19 AM Xuannan Su  wrote:
> >
> > > +1 (non-binding)
> > >
> > > Best,
> > > Xuannan
> > >
> > > On Thu, Jan 11, 2024 at 10:28 AM Xuyang  wrote:
> > > >
> > > > +1 (non-binding)--
> > > >
> > > > Best!
> > > > Xuyang
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > 在 2024-01-11 10:00:11,"Yang Wang"  写道:
> > > > >+1 (binding)
> > > > >
> > > > >
> > > > >Best,
> > > > >Yang
> > > > >
> > > > >On Thu, Jan 11, 2024 at 9:53 AM liu ron  wrote:
> > > > >
> > > > >> +1 non-binding
> > > > >>
> > > > >> Best
> > > > >> Ron
> > > > >>
> > > > >> Matthias Pohl  于2024年1月10日周三
> > 23:05写道:
> > > > >>
> > > > >> > +1 (binding)
> > > > >> >
> > > > >> > On Wed, Jan 10, 2024 at 3:35 PM ConradJam 
> > > wrote:
> > > > >> >
> > > > >> > > +1 non-binding
> > > > >> > >
> > > > >> > > Dawid Wysakowicz  于2024年1月10日周三
> > 21:06写道:
> > > > >> > >
> > > > >> > > > +1 (binding)
> > > > >> > > > Best,
> > > > >> > > > Dawid
> > > > >> > > >
> > > > >> > > > On Wed, 10 Jan 2024 at 11:54, Piotr Nowojski <
> > > pnowoj...@apache.org>
> > > > >> > > wrote:
> > > > >> > > >
> > > > >> > > > > +1 (binding)
> > > > >> > > > >
> > > > >> > > > > śr., 10 sty 2024 o 11:25 Martijn Visser <
> > > martijnvis...@apache.org>
> > > > >> > > > > napisał(a):
> > > > >> > > > >
> > > > >> > > > > > +1 (binding)
> > > > >> > > > > >
> > > > >> > > > > > On Wed, Jan 10, 2024 at 4:43 AM Xingbo Huang <
> > > hxbks...@gmail.com
> > > > >> >
> > > > >> > > > wrote:
> > > > >> > > > > > >
> > > > >> > > > > > > +1 (binding)
> > > > >> > > > > > >
> > > > >> > > > > > > Best,
> > > > >> > > > > > > Xingbo
> > > > >> > > > > > >
> > > > >> > > > > > > Dian Fu  于2024年1月10日周三 11:35写道:
> > > > >> > > > > > >
> > > > >> > > > > > > > +1 (binding)
> > > > >> > > > > > > >
> > > > >> > > > > > > > Regards,
> > > > >> > > > > > > > Dian
> > > > >> > > > > > > >
> > > > >> > > > > > > > On Wed, Jan 10, 2024 at 5:09 AM Sharath <
> > > > >> dsaishar...@gmail.com
> > > > >> > >
> > > > >> > > > > wrote:
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > +1 (non-binding)
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > Best,
> > > > >> > > > > > > > > Sharath
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > On Tue, Jan 9, 2024 at 1:02 PM Venkata Sanath
> > > Muppalla <
> > > > >> > > > > > > > sanath...@gmail.com>
> > > > >> > > > > > > > > wrote:
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > > +1 (non-binding)
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > > Thanks,
> > > > >> > > > > > > > > > Sanath
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > > On Tue, Jan 9, 2024 at 11:16 AM Peter Huang <
> > > > >> > > > > > > > huangzhenqiu0...@gmail.com>
> > > > >> > > > > > > > > > wrote:
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > > > +1 (non-binding)
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > Best Regards
> > > > >> > > > > > > > > > > Peter Huang
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > On Tue, Jan 9, 2024 at 5:26 AM Jane Chan <
> > > > >> > > > > qingyue@gmail.com>
> > > > >> > > > > > > > wrote:
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > > +1 (non-binding)
> > > > >> > > > > > > > > > > >
> > > > >> > > > > > > > > > > > Best,
> > > > >> > > > > > > > > > > > Jane
> > > > >> > > > > > > > > > > >
> > > > >> > > > > > > > > > > > On Tue, Jan 9, 2024 at 8:41 PM Lijie Wang <
> > > > >> > > > > > > > wangdachui9...@gmail.com>
> > > > >> > > > > > > > > > > > wrote:
> > > > >> > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > +1 (non-binding)
> > > > >> > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > Best,
> > > > >> > > > > > > > > > > > > Lijie
> > > > >> > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > Jiabao Sun  > .invalid>
> > > > >> > > > 于2024年1月9日周二
> > > > >> > > > > > > > 19:28写道:
> > > > >> > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > +1 (non-binding)
> > > > >> > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > Best,
> > > > >> > > > > > > > > > > > > > Jiabao
> > > > >> > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > On 2024/01/09 09:58:04 xiangyu feng wrote:
> > > > >> > > > > > > > > > > > > > > +1 (non-binding)
> > > > >> > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > Regards,
> > > > >> > > > > > > > > > > > > > > Xiangyu Feng
> > > > >> > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > Danny Cranmer 
> > > 于2024年1月9日周二
> > > > >> > > > 17:50写道:
> > > > >> > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > +1 (binding)
> > > > >> > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > Thanks,
> > > > >> > > > > 

Re: [DISCUSS] FLIP-91 - Support SQL Client Gateway

2020-02-05 Thread godfrey he
h blew up in complexity and never finished). Would having a
> > dedicated gateway component mean that we can simplify the client and make
> > it a simple "shell around the table environment"? I think that would be
> > good, it would make it much easier to have new Table API features
> available
> > in the SQL client.
> >
> > (2) Have you considered making this a standalone project? This seems like
> > unit of functionality that would be useful to have separately, and it
> would
> > have a few advantages:
> >
> >- Flink codebase is already very large and hard to maintain
> >- A separate project is simpler to develop, not limited by Flink
> > committer reviews
> >- Quicker independent releases when new features are added.
> >
> > I see other projects successfully putting ecosystem tools into separate
> > projects, like Livy for Spark.
> > Should we do the same here?
> >
> > Best,
> > Stephan
> >
> >
> > On Fri, Jan 17, 2020 at 1:48 PM godfrey he  wrote:
> >
> >> Hi devs,
> >>
> >> I've updated the FLIP-91 [0] according to feedbacks. Please take another
> >> look.
> >>
> >> Best,
> >> godfrey
> >>
> >> [0]
> >>
> >>
> https://docs.google.com/document/d/1DKpFdov1o_ObvrCmU-5xi-VrT6nR2gxq-BbswSSI9j8/
> >> <
> >>
> https://docs.google.com/document/d/1DKpFdov1o_ObvrCmU-5xi-VrT6nR2gxq-BbswSSI9j8/edit#heading=h.cje99dt78an2
> >>>
> >>
> >> Kurt Young  于2020年1月9日周四 下午4:21写道:
> >>
> >>> Hi,
> >>>
> >>> +1 to the general idea. Supporting sql client gateway mode will bridge
> >> the
> >>> connection
> >>> between Flink SQL and production environment. Also the JDBC driver is a
> >>> quite good
> >>> supplement for usability of Flink SQL, users will have more choices to
> >> try
> >>> out Flink SQL
> >>> such as Tableau.
> >>>
> >>> I went through the document and left some comments there.
> >>>
> >>> Best,
> >>> Kurt
> >>>
> >>>
> >>> On Sun, Jan 5, 2020 at 1:57 PM tison  wrote:
> >>>
> >>>> The general idea sounds great. I'm going to keep up with the progress
> >>> soon.
> >>>>
> >>>> Best,
> >>>> tison.
> >>>>
> >>>>
> >>>> Bowen Li  于2020年1月5日周日 下午12:59写道:
> >>>>
> >>>>> +1. It will improve user experience quite a bit.
> >>>>>
> >>>>>
> >>>>> On Thu, Jan 2, 2020 at 22:07 Yangze Guo  wrote:
> >>>>>
> >>>>>> Thanks for driving this, Xiaoling!
> >>>>>>
> >>>>>> +1 for supporting SQL client gateway.
> >>>>>>
> >>>>>> Best,
> >>>>>> Yangze Guo
> >>>>>>
> >>>>>>
> >>>>>> On Thu, Jan 2, 2020 at 9:58 AM 贺小令  wrote:
> >>>>>>>
> >>>>>>> Hey everyone,
> >>>>>>> FLIP-24
> >>>>>>> <
> >>>>>
> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-24+-+SQL+Client
> >>>>
> >>>>>>> proposes the whole conception and architecture of SQL Client. The
> >>>>>> embedded
> >>>>>>> mode is already supported since release-1.5, which is helpful for
> >>>>>>> debugging/demo purposes.
> >>>>>>> Many users ask that how to submit a Flink job to online
> >> environment
> >>>>>> without
> >>>>>>> programming on Flink API. To solve this, we create FLIP-91 [0]
> >>> which
> >>>>>>> supports sql client gateway mode, then users can submit a job
> >>> through
> >>>>> CLI
> >>>>>>> client, REST API or JDBC.
> >>>>>>>
> >>>>>>> I'm glad that you can give me more feedback about FLIP-91.
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> godfreyhe
> >>>>>>>
> >>>>>>> [0]
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
>
>


Re: [DISCUSS] FLIP-84: Improve & Refactor execute/sqlQuery/sqlUpdate APIS of TableEnvironment

2020-02-13 Thread godfrey he
hi kurt,jark,jingsong

Regarding to "fromQuery", I agree with kurt. In addition, I think `Table
from(String tableName)` should be renamed to `Table fromCatalog(String
tableName)`.

Regarding to the "DmlBatch", DML contains "INSERT", "UPDATE", "DELETE", and
they can be executed in a same batch in the future. So we can add
"addUpdate" method and "addDelete" method to support them.

Regarding to the "Inserts addInsert", maybe we can add a "DmlBatchBuilder".

open to more discussion

Best,
godfrey



Kurt Young  于2020年2月13日周四 下午4:56写道:

> Regarding to "fromQuery" is confusing users with "Table from(String
> tableName)", I have
> a just opposite opinion. I think this "fromXXX" pattern can make users
> quite clear when they
> want to get a Table from TableEnvironment. Similar interfaces will also
> include like "fromElements".
>
> Regarding to the name of DmlBatch, I think it's mainly for
> future flexibility, in case we can support
> other statement in a single batch. If that happens, the name "Inserts" will
> be weird.
>
> Best,
> Kurt
>
>
> On Thu, Feb 13, 2020 at 4:03 PM Jark Wu  wrote:
>
> > I agree with Jingsong.
> >
> > +1 to keep `sqlQuery`, it's clear from the method name and return type
> that
> > it accepts a SELECT query and returns a logic representation `Table`.
> > The `fromQuery` is a little confused users with the `Table from(String
> > tableName)` method.
> >
> > Regarding to the `DmlBatch`, I agree with Jingsong, AFAIK, the purpose of
> > `DmlBatch` is used to batching insert statements.
> > Besides, DML terminology is not commonly know among users. So what about
> > `InsertsBatching startBatchingInserts()` ?
> >
> > Best,
> > Jark
> >
> > On Thu, 13 Feb 2020 at 15:50, Jingsong Li 
> wrote:
> >
> > > Hi Godfrey,
> > >
> > > Thanks for updating. +1 sketchy.
> > >
> > > I have no idea to change "sqlQuery" to "fromQuery", I think "sqlQuery"
> is
> > > OK, It's not that confusing with return values.
> > >
> > > Can we change the "DmlBatch" to "Inserts"?  I don't see any other
> needs.
> > > "Dml" seems a little weird.
> > > It is better to support "Inserts addInsert" too. Users can
> > > "inserts.addInsert().addInsert()"
> > >
> > > I try to match the new interfaces with the old interfaces simply.
> > > - "startInserts -> addInsert" replace old "sqlUpdate(insert)" and
> > > "insertInto".
> > > - "executeStatement" new one, execute all kinds of sqls immediately.
> > > Including old "sqlUpdate(DDLs)".
> > >
> > > Best,
> > > Jingsong Lee
> > >
> > > On Wed, Feb 12, 2020 at 11:10 AM godfreyhe 
> wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > I'd like to resume the discussion for FlIP-84 [0]. I had updated the
> > > > document, the mainly changes are:
> > > >
> > > > 1. about "`void sqlUpdate(String sql)`" section
> > > >   a) change "Optional executeSql(String sql) throws
> > > Exception"
> > > > to "ResultTable executeStatement(String statement, String jobName)
> > throws
> > > > Exception". The reason is: "statement" is a more general concept than
> > > > "sql",
> > > > e.g. "show xx" is not a sql command (refer to [1]), but is a
> statement
> > > > (just
> > > > like JDBC). "insert" statement also has return value which is the
> > > affected
> > > > row count, we can unify the return type to "ResultTable" instead of
> > > > "Optional".
> > > >   b) add two sub-interfaces for "ResultTable": "RowResultTable" is
> used
> > > for
> > > > non-streaming select statement and will not contain change flag;
> > > > "RowWithChangeFlagResultTable" is used for streaming select statement
> > and
> > > > will contain change flag.
> > > >
> > > > 2) about "Support batch sql execute and explain" section
> > > > introduce "DmlBatch" to support both sql and Table API (which is
> > borrowed
> > > > from the ideas Dawid mentioned in the slack)
> > > >
> > > > interface TableEnvironment {
> > > > DmlBatch startDmlBatch();
> > > > }
> > > >
> > > > interface DmlBatch {
> > > >   /**
> > > >   * add insert statement to the batch
> > > >   */
> > > > void addInsert(String insert);
> > > >
> > > >  /**
> > > >   * add Table with given sink name to the batch
> > > >   */
> > > > void addInsert(String sinkName, Table table);
> > > >
> > > >  /**
> > > >   * execute the dml statements as a batch
> > > >   */
> > > >   ResultTable execute(String jobName) throws Exception
> > > >
> > > >   /**
> > > >  * Returns the AST and the execution plan to compute the result of
> the
> > > > batch
> > > > dml statement.
> > > >   */
> > > >   String explain(boolean extended);
> > > > }
> > > >
> > > > 3) about "Discuss a parse method for multiple statements execute in
> SQL
> > > > CLI"
> > > > section
> > > > add the pros and cons for each solution
> > > >
> > > > 4) update the "Examples" section and "Summary" section based on the
> > above
> > > > changes
> > > >
> > > > Please refer the design doc[1] for more details and welcome any
> > feedback.
> > > >
> > > > Bests,
> > > > godfreyhe
> > > >
> > > >
> > > > [0]
> 

Re: [DISCUSS] FLIP-84: Improve & Refactor execute/sqlQuery/sqlUpdate APIS of TableEnvironment

2020-02-18 Thread godfrey he
Thanks Kurt and Jark for explanation, I now also think we should make the
TableEnvironment interface more statable and should not change "sqlQuery"
method and "from" method.

Hi Jingsong. Regarding to the "DmlBatch", I totally agree with advantages
of "addBatch" method. However, there are two more questions need to solve:
one is how users write multi-sink programs in a Table API ? and another is
how users explain multi-sink program in both SQL and Table API ?
Currently, "DmlBatch" class can solve those questions. (the main
disadvantages is Inconsistent with the current interface)

Bests,
godfrey

Jingsong Li  于2020年2月15日周六 下午9:09写道:

> Hi Kurt and Godfrey,
>
> Thank you for your explanation.
>
> Regarding to the "DmlBatch",
> I see there are some description for JDBC Statement.addBatch in the
> document.
> What do you think about introducing "addBatch" to the TableEnv instead of
> introducing a new class?
> The advantage is:
> - Consistent with JDBC statement.
> - Consistent with current interface, what we need do is just modify method
> name.
>
> Best,
> Jingsong Lee
>
>
> On Sat, Feb 15, 2020 at 4:48 PM Kurt Young  wrote:
>
> > I don't think we should change `from` to `fromCatalog`, especially `from`
> > is just
> > introduced in 1.10. I agree with Jark we should change interface only
> when
> > necessary,
> > e.g. the semantic is broken or confusing. So I'm +1 to keep `sqlQuery` as
> > it is.
> >
> > Best,
> > Kurt
> >
> >
> > On Sat, Feb 15, 2020 at 3:59 PM Jark Wu  wrote:
> >
> > > Thanks Kurt and Godfrey for the explanation,
> > >
> > > It makes sense to me that renaming `from(tableName)` to
> > > `fromCatalog(tableName)`.
> > > However, I still think `sqlQuery(query)` is clear and works well. Is it
> > > necessary to change it?
> > >
> > > We removed `sql(query)` and introduced `sqlQuery(query)`, we removed
> > > `scan(tableName)` and introduced `from(tableName)`,
> > > and now we want to remove them again. Users will feel like the
> interface
> > is
> > > very unstable, that really frustrates users.
> > > I think we should be cautious to remove interface and only when it is
> > > necessary.
> > >
> > > Best,
> > > Jark
> > >
> > >
> > >
> > > On Thu, 13 Feb 2020 at 20:58, godfrey he  wrote:
> > >
> > > > hi kurt,jark,jingsong
> > > >
> > > > Regarding to "fromQuery", I agree with kurt. In addition, I think
> > `Table
> > > > from(String tableName)` should be renamed to `Table
> fromCatalog(String
> > > > tableName)`.
> > > >
> > > > Regarding to the "DmlBatch", DML contains "INSERT", "UPDATE",
> "DELETE",
> > > and
> > > > they can be executed in a same batch in the future. So we can add
> > > > "addUpdate" method and "addDelete" method to support them.
> > > >
> > > > Regarding to the "Inserts addInsert", maybe we can add a
> > > "DmlBatchBuilder".
> > > >
> > > > open to more discussion
> > > >
> > > > Best,
> > > > godfrey
> > > >
> > > >
> > > >
> > > > Kurt Young  于2020年2月13日周四 下午4:56写道:
> > > >
> > > > > Regarding to "fromQuery" is confusing users with "Table from(String
> > > > > tableName)", I have
> > > > > a just opposite opinion. I think this "fromXXX" pattern can make
> > users
> > > > > quite clear when they
> > > > > want to get a Table from TableEnvironment. Similar interfaces will
> > also
> > > > > include like "fromElements".
> > > > >
> > > > > Regarding to the name of DmlBatch, I think it's mainly for
> > > > > future flexibility, in case we can support
> > > > > other statement in a single batch. If that happens, the name
> > "Inserts"
> > > > will
> > > > > be weird.
> > > > >
> > > > > Best,
> > > > > Kurt
> > > > >
> > > > >
> > > > > On Thu, Feb 13, 2020 at 4:03 PM Jark Wu  wrote:
> > > > >
> > > > > > I agree with Jingsong.
> > > > > >
> > > > > > +1 to keep `sqlQuery`, it's clear from th

Re: [ANNOUNCE] Jingsong Lee becomes a Flink committer

2020-02-20 Thread godfrey he
Congrats Jingsong! Well deserved.

Best,
godfrey

Jeff Zhang  于2020年2月21日周五 上午11:49写道:

> Congratulations!Jingsong. You deserve it
>
> wenlong.lwl  于2020年2月21日周五 上午11:43写道:
>
>> Congrats Jingsong!
>>
>> On Fri, 21 Feb 2020 at 11:41, Dian Fu  wrote:
>>
>> > Congrats Jingsong!
>> >
>> > > 在 2020年2月21日,上午11:39,Jark Wu  写道:
>> > >
>> > > Congratulations Jingsong! Well deserved.
>> > >
>> > > Best,
>> > > Jark
>> > >
>> > > On Fri, 21 Feb 2020 at 11:32, zoudan  wrote:
>> > >
>> > >> Congratulations! Jingsong
>> > >>
>> > >>
>> > >> Best,
>> > >> Dan Zou
>> > >>
>> >
>> >
>>
>
>
> --
> Best Regards
>
> Jeff Zhang
>


[VOTE] FLIP-84: Improve & Refactor API of TableEnvironment

2020-02-26 Thread godfrey he
Hi everyone,

I'd like to start the vote of FLIP-84[1], which proposes to deprecate some
old APIs and introduce some new APIs in TableEnvironment. This FLIP is
discussed and reached consensus in the discussion thread[2].

The vote will be open for at least 72 hours. Unless there is an objection,
I will try to close it by Mar 1, 2020 07:00 UTC if we have received
sufficient votes.


[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-84%3A+Improve+%26+Refactor+API+of+TableEnvironment

[2]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-84-Improve-amp-Refactor-API-of-Table-Module-td34537.html


Bests,
Godfrey


Re: [VOTE] FLIP-84: Improve & Refactor API of TableEnvironment

2020-02-27 Thread godfrey he
Hi kant, yes. We hope to deprecate the methods which confuse users ASAP.

Bests,
godfrey

kant kodali  于2020年2月28日周五 上午11:17写道:

> Is this targeted towards Flink 1.11?
>
> On Thu, Feb 27, 2020 at 6:32 PM Kurt Young  wrote:
>
> >  +1 (binding)
> >
> > Best,
> > Kurt
> >
> >
> > On Fri, Feb 28, 2020 at 9:15 AM Terry Wang  wrote:
> >
> > > I look through the whole design and it’s a big improvement of usability
> > on
> > > TableEnvironment’s api.
> > >
> > > +1 (non-binding)
> > >
> > > Best,
> > > Terry Wang
> > >
> > >
> > >
> > > > 2020年2月27日 14:59,godfrey he  写道:
> > > >
> > > > Hi everyone,
> > > >
> > > > I'd like to start the vote of FLIP-84[1], which proposes to deprecate
> > > some
> > > > old APIs and introduce some new APIs in TableEnvironment. This FLIP
> is
> > > > discussed and reached consensus in the discussion thread[2].
> > > >
> > > > The vote will be open for at least 72 hours. Unless there is an
> > > objection,
> > > > I will try to close it by Mar 1, 2020 07:00 UTC if we have received
> > > > sufficient votes.
> > > >
> > > >
> > > > [1]
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-84%3A+Improve+%26+Refactor+API+of+TableEnvironment
> > > >
> > > > [2]
> > > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-84-Improve-amp-Refactor-API-of-Table-Module-td34537.html
> > > >
> > > >
> > > > Bests,
> > > > Godfrey
> > >
> > >
> >
>


Re: [DISCUSS] FLIP-84: Improve & Refactor execute/sqlQuery/sqlUpdate APIS of TableEnvironment

2020-02-28 Thread godfrey he
Hi Benchao,

> I have one question about this FLIP:
> executeStatement  accepts DML, what if it's a streaming DML ?
>does it submit the job to cluster directly and blocks forever? what's
> the behavior for the next statements?
`executeStatement` is a synchronous method, will execute the statement once
calling this method and return the result until the job is finished.
We will introduce asynchronous method like `executeStatementAsync` in the
future.

> nit: there's a typo in "the table describing the result for each kind of
> statement", "*Result Scheam" -> "Result Schema"*
Thanks for the reminding, I will fix it now.

Bests,
Godfrey

Benchao Li  于2020年2月28日周五 下午4:00写道:

> Hi Terry,
>
> Thanks for the propose, and sorry for joining the party late.
>
> I have one question about this FLIP:
> executeStatement  accepts DML, what if it's a streaming DML ?
> does it submit the job to cluster directly and blocks forever? what's
> the behavior for the next statements?
>
> nit: there's a typo in "the table describing the result for each kind of
> statement", "*Result Scheam" -> "Result Schema"*
>
>
> godfrey he  于2020年2月18日周二 下午4:41写道:
>
> > Thanks Kurt and Jark for explanation, I now also think we should make the
> > TableEnvironment interface more statable and should not change "sqlQuery"
> > method and "from" method.
> >
> > Hi Jingsong. Regarding to the "DmlBatch", I totally agree with advantages
> > of "addBatch" method. However, there are two more questions need to
> solve:
> > one is how users write multi-sink programs in a Table API ? and another
> is
> > how users explain multi-sink program in both SQL and Table API ?
> > Currently, "DmlBatch" class can solve those questions. (the main
> > disadvantages is Inconsistent with the current interface)
> >
> > Bests,
> > godfrey
> >
> > Jingsong Li  于2020年2月15日周六 下午9:09写道:
> >
> > > Hi Kurt and Godfrey,
> > >
> > > Thank you for your explanation.
> > >
> > > Regarding to the "DmlBatch",
> > > I see there are some description for JDBC Statement.addBatch in the
> > > document.
> > > What do you think about introducing "addBatch" to the TableEnv instead
> of
> > > introducing a new class?
> > > The advantage is:
> > > - Consistent with JDBC statement.
> > > - Consistent with current interface, what we need do is just modify
> > method
> > > name.
> > >
> > > Best,
> > > Jingsong Lee
> > >
> > >
> > > On Sat, Feb 15, 2020 at 4:48 PM Kurt Young  wrote:
> > >
> > > > I don't think we should change `from` to `fromCatalog`, especially
> > `from`
> > > > is just
> > > > introduced in 1.10. I agree with Jark we should change interface only
> > > when
> > > > necessary,
> > > > e.g. the semantic is broken or confusing. So I'm +1 to keep
> `sqlQuery`
> > as
> > > > it is.
> > > >
> > > > Best,
> > > > Kurt
> > > >
> > > >
> > > > On Sat, Feb 15, 2020 at 3:59 PM Jark Wu  wrote:
> > > >
> > > > > Thanks Kurt and Godfrey for the explanation,
> > > > >
> > > > > It makes sense to me that renaming `from(tableName)` to
> > > > > `fromCatalog(tableName)`.
> > > > > However, I still think `sqlQuery(query)` is clear and works well.
> Is
> > it
> > > > > necessary to change it?
> > > > >
> > > > > We removed `sql(query)` and introduced `sqlQuery(query)`, we
> removed
> > > > > `scan(tableName)` and introduced `from(tableName)`,
> > > > > and now we want to remove them again. Users will feel like the
> > > interface
> > > > is
> > > > > very unstable, that really frustrates users.
> > > > > I think we should be cautious to remove interface and only when it
> is
> > > > > necessary.
> > > > >
> > > > > Best,
> > > > > Jark
> > > > >
> > > > >
> > > > >
> > > > > On Thu, 13 Feb 2020 at 20:58, godfrey he 
> > wrote:
> > > > >
> > > > > > hi kurt,jark,jingsong
> > > > > >
> > > > > > Regarding to "fromQuery", I agree with kurt. In addition, I think
> > > > 

Re: [DISCUSS] FLIP-84: Improve & Refactor execute/sqlQuery/sqlUpdate APIS of TableEnvironment

2020-03-01 Thread godfrey he
Hi Benchao,

I think the document has contained both parts: the behavior is explained
when introducing `executeStatement` method, and asynchronous execution
methods is explained in the appendix.

Bests,
Godfrey

Benchao Li  于2020年2月28日周五 下午10:09写道:

> Hi godfrey,
>
> Thanks for your explanation.
>
> Do we need to clarify this in the FLIP? Maybe this confuses other users as
> well.
>
> godfrey he  于2020年2月28日周五 下午4:54写道:
>
> > Hi Benchao,
> >
> > > I have one question about this FLIP:
> > > executeStatement  accepts DML, what if it's a streaming DML ?
> > >does it submit the job to cluster directly and blocks forever?
> what's
> > > the behavior for the next statements?
> > `executeStatement` is a synchronous method, will execute the statement
> once
> > calling this method and return the result until the job is finished.
> > We will introduce asynchronous method like `executeStatementAsync` in the
> > future.
> >
> > > nit: there's a typo in "the table describing the result for each kind
> of
> > > statement", "*Result Scheam" -> "Result Schema"*
> > Thanks for the reminding, I will fix it now.
> >
> > Bests,
> > Godfrey
> >
> > Benchao Li  于2020年2月28日周五 下午4:00写道:
> >
> > > Hi Terry,
> > >
> > > Thanks for the propose, and sorry for joining the party late.
> > >
> > > I have one question about this FLIP:
> > > executeStatement  accepts DML, what if it's a streaming DML ?
> > > does it submit the job to cluster directly and blocks forever?
> what's
> > > the behavior for the next statements?
> > >
> > > nit: there's a typo in "the table describing the result for each kind
> of
> > > statement", "*Result Scheam" -> "Result Schema"*
> > >
> > >
> > > godfrey he  于2020年2月18日周二 下午4:41写道:
> > >
> > > > Thanks Kurt and Jark for explanation, I now also think we should make
> > the
> > > > TableEnvironment interface more statable and should not change
> > "sqlQuery"
> > > > method and "from" method.
> > > >
> > > > Hi Jingsong. Regarding to the "DmlBatch", I totally agree with
> > advantages
> > > > of "addBatch" method. However, there are two more questions need to
> > > solve:
> > > > one is how users write multi-sink programs in a Table API ? and
> another
> > > is
> > > > how users explain multi-sink program in both SQL and Table API ?
> > > > Currently, "DmlBatch" class can solve those questions. (the main
> > > > disadvantages is Inconsistent with the current interface)
> > > >
> > > > Bests,
> > > > godfrey
> > > >
> > > > Jingsong Li  于2020年2月15日周六 下午9:09写道:
> > > >
> > > > > Hi Kurt and Godfrey,
> > > > >
> > > > > Thank you for your explanation.
> > > > >
> > > > > Regarding to the "DmlBatch",
> > > > > I see there are some description for JDBC Statement.addBatch in the
> > > > > document.
> > > > > What do you think about introducing "addBatch" to the TableEnv
> > instead
> > > of
> > > > > introducing a new class?
> > > > > The advantage is:
> > > > > - Consistent with JDBC statement.
> > > > > - Consistent with current interface, what we need do is just modify
> > > > method
> > > > > name.
> > > > >
> > > > > Best,
> > > > > Jingsong Lee
> > > > >
> > > > >
> > > > > On Sat, Feb 15, 2020 at 4:48 PM Kurt Young 
> wrote:
> > > > >
> > > > > > I don't think we should change `from` to `fromCatalog`,
> especially
> > > > `from`
> > > > > > is just
> > > > > > introduced in 1.10. I agree with Jark we should change interface
> > only
> > > > > when
> > > > > > necessary,
> > > > > > e.g. the semantic is broken or confusing. So I'm +1 to keep
> > > `sqlQuery`
> > > > as
> > > > > > it is.
> > > > > >
> > > > > > Best,
> > > > > > Kurt
> > > > > >
> > > > > >
> > > > > > On Sat, Feb 15, 2020 at 3:59

Re: [VOTE] FLIP-84: Improve & Refactor API of TableEnvironment

2020-03-01 Thread godfrey he
Thanks Jingsong for the reminding. I will update it now.

Jingsong Lee  于2020年3月2日周一 下午2:46写道:

> Thanks for driving.
>
> +1 from my side.
>
> > For current messy Flink table program trigger point, we propose that: for
> TableEnvironment and StreamTableEnvironment, you must use
> `TableEnvironment.execute()` to trigger table program execution.
>
> Looks like this is an incompatible change. You need update Compatibility
> chapter? And should add it to 1.11 release note in future.
>
> Best,
> Jingsong Lee
>
> On Fri, Feb 28, 2020 at 10:10 PM Benchao Li  wrote:
>
> > +1 (non-binding)
> >
> > Jark Wu  于2020年2月28日周五 下午5:11写道:
> >
> > > +1 from my side.
> > >
> > > Best,
> > > Jark
> > >
> > > On Fri, 28 Feb 2020 at 15:07, kant kodali  wrote:
> > >
> > > > Nice!!
> > > >
> > > > Sent from my iPhone
> > > >
> > > > > On Feb 27, 2020, at 9:03 PM, godfrey he 
> wrote:
> > > > >
> > > > > Hi kant, yes. We hope to deprecate the methods which confuse users
> > > ASAP.
> > > > >
> > > > > Bests,
> > > > > godfrey
> > > > >
> > > > > kant kodali  于2020年2月28日周五 上午11:17写道:
> > > > >
> > > > >> Is this targeted towards Flink 1.11?
> > > > >>
> > > > >>> On Thu, Feb 27, 2020 at 6:32 PM Kurt Young 
> > wrote:
> > > > >>>
> > > > >>> +1 (binding)
> > > > >>>
> > > > >>> Best,
> > > > >>> Kurt
> > > > >>>
> > > > >>>
> > > > >>>> On Fri, Feb 28, 2020 at 9:15 AM Terry Wang 
> > > > wrote:
> > > > >>>
> > > > >>>> I look through the whole design and it’s a big improvement of
> > > > usability
> > > > >>> on
> > > > >>>> TableEnvironment’s api.
> > > > >>>>
> > > > >>>> +1 (non-binding)
> > > > >>>>
> > > > >>>> Best,
> > > > >>>> Terry Wang
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>>> 2020年2月27日 14:59,godfrey he  写道:
> > > > >>>>>
> > > > >>>>> Hi everyone,
> > > > >>>>>
> > > > >>>>> I'd like to start the vote of FLIP-84[1], which proposes to
> > > deprecate
> > > > >>>> some
> > > > >>>>> old APIs and introduce some new APIs in TableEnvironment. This
> > FLIP
> > > > >> is
> > > > >>>>> discussed and reached consensus in the discussion thread[2].
> > > > >>>>>
> > > > >>>>> The vote will be open for at least 72 hours. Unless there is an
> > > > >>>> objection,
> > > > >>>>> I will try to close it by Mar 1, 2020 07:00 UTC if we have
> > received
> > > > >>>>> sufficient votes.
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> [1]
> > > > >>>>>
> > > > >>>>
> > > > >>>
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-84%3A+Improve+%26+Refactor+API+of+TableEnvironment
> > > > >>>>>
> > > > >>>>> [2]
> > > > >>>>>
> > > > >>>>
> > > > >>>
> > > > >>
> > > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-84-Improve-amp-Refactor-API-of-Table-Module-td34537.html
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> Bests,
> > > > >>>>> Godfrey
> > > > >>>>
> > > > >>>>
> > > > >>>
> > > > >>
> > > >
> > >
> >
> >
> > --
> >
> > Benchao Li
> > School of Electronics Engineering and Computer Science, Peking University
> > Tel:+86-15650713730
> > Email: libenc...@gmail.com; libenc...@pku.edu.cn
> >
>
>
> --
> Best, Jingsong Lee
>


Re: [VOTE] FLIP-84: Improve & Refactor API of TableEnvironment

2020-03-01 Thread godfrey he
Thanks all for the votes.
So far, we have

   - 3 binding +1 votes (Kurt, Jark, Jingsong)
   - 2 non-binding +1 votes (Terry, Benchao)
   - No -1 votes

The voting time has past and there is enough +1 votes to consider the FLIP-84
approved.
Thank you all.


Best,
Godfrey

godfrey he  于2020年3月2日周一 下午3:32写道:

> Thanks Jingsong for the reminding. I will update it now.
>
> Jingsong Lee  于2020年3月2日周一 下午2:46写道:
>
>> Thanks for driving.
>>
>> +1 from my side.
>>
>> > For current messy Flink table program trigger point, we propose that:
>> for
>> TableEnvironment and StreamTableEnvironment, you must use
>> `TableEnvironment.execute()` to trigger table program execution.
>>
>> Looks like this is an incompatible change. You need update Compatibility
>> chapter? And should add it to 1.11 release note in future.
>>
>> Best,
>> Jingsong Lee
>>
>> On Fri, Feb 28, 2020 at 10:10 PM Benchao Li  wrote:
>>
>> > +1 (non-binding)
>> >
>> > Jark Wu  于2020年2月28日周五 下午5:11写道:
>> >
>> > > +1 from my side.
>> > >
>> > > Best,
>> > > Jark
>> > >
>> > > On Fri, 28 Feb 2020 at 15:07, kant kodali  wrote:
>> > >
>> > > > Nice!!
>> > > >
>> > > > Sent from my iPhone
>> > > >
>> > > > > On Feb 27, 2020, at 9:03 PM, godfrey he 
>> wrote:
>> > > > >
>> > > > > Hi kant, yes. We hope to deprecate the methods which confuse
>> users
>> > > ASAP.
>> > > > >
>> > > > > Bests,
>> > > > > godfrey
>> > > > >
>> > > > > kant kodali  于2020年2月28日周五 上午11:17写道:
>> > > > >
>> > > > >> Is this targeted towards Flink 1.11?
>> > > > >>
>> > > > >>> On Thu, Feb 27, 2020 at 6:32 PM Kurt Young 
>> > wrote:
>> > > > >>>
>> > > > >>> +1 (binding)
>> > > > >>>
>> > > > >>> Best,
>> > > > >>> Kurt
>> > > > >>>
>> > > > >>>
>> > > > >>>> On Fri, Feb 28, 2020 at 9:15 AM Terry Wang > >
>> > > > wrote:
>> > > > >>>
>> > > > >>>> I look through the whole design and it’s a big improvement of
>> > > > usability
>> > > > >>> on
>> > > > >>>> TableEnvironment’s api.
>> > > > >>>>
>> > > > >>>> +1 (non-binding)
>> > > > >>>>
>> > > > >>>> Best,
>> > > > >>>> Terry Wang
>> > > > >>>>
>> > > > >>>>
>> > > > >>>>
>> > > > >>>>> 2020年2月27日 14:59,godfrey he  写道:
>> > > > >>>>>
>> > > > >>>>> Hi everyone,
>> > > > >>>>>
>> > > > >>>>> I'd like to start the vote of FLIP-84[1], which proposes to
>> > > deprecate
>> > > > >>>> some
>> > > > >>>>> old APIs and introduce some new APIs in TableEnvironment. This
>> > FLIP
>> > > > >> is
>> > > > >>>>> discussed and reached consensus in the discussion thread[2].
>> > > > >>>>>
>> > > > >>>>> The vote will be open for at least 72 hours. Unless there is
>> an
>> > > > >>>> objection,
>> > > > >>>>> I will try to close it by Mar 1, 2020 07:00 UTC if we have
>> > received
>> > > > >>>>> sufficient votes.
>> > > > >>>>>
>> > > > >>>>>
>> > > > >>>>> [1]
>> > > > >>>>>
>> > > > >>>>
>> > > > >>>
>> > > > >>
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-84%3A+Improve+%26+Refactor+API+of+TableEnvironment
>> > > > >>>>>
>> > > > >>>>> [2]
>> > > > >>>>>
>> > > > >>>>
>> > > > >>>
>> > > > >>
>> > > >
>> > >
>> >
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-84-Improve-amp-Refactor-API-of-Table-Module-td34537.html
>> > > > >>>>>
>> > > > >>>>>
>> > > > >>>>> Bests,
>> > > > >>>>> Godfrey
>> > > > >>>>
>> > > > >>>>
>> > > > >>>
>> > > > >>
>> > > >
>> > >
>> >
>> >
>> > --
>> >
>> > Benchao Li
>> > School of Electronics Engineering and Computer Science, Peking
>> University
>> > Tel:+86-15650713730
>> > Email: libenc...@gmail.com; libenc...@pku.edu.cn
>> >
>>
>>
>> --
>> Best, Jingsong Lee
>>
>


Re: [DISCUSS] FLIP-114: Support Python UDF in SQL Client

2020-03-09 Thread godfrey he
Hi Wei, thanks for the proposal.

I think it's better to give two more examples, one is how to use python UDF
in SQL, another is how to start sql-client.sh with full python dependencies.

Best,
Godfrey

Wei Zhong  于2020年3月9日周一 下午10:09写道:

> Hi everyone,
>
> I would like to start discussion about how to support Python UDF in SQL
> Client.
>
> Flink Python UDF(FLIP-58[1]) has already been introduced in the release of
> 1.10.0 and the support for SQL DDL is introduced in FLIP-106[2].
>
> SQL Client defines UDF via the environment file and has its own CLI
> implementation to manage dependencies, but neither of which supports Python
> UDF. We want to introduce the support of Python UDF for SQL Client,
> including the registration and the dependency management of Python UDF.
>
> Here is the design doc:
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-114%3A+Support+Python+UDF+in+SQL+Client
>
> Looking forward to your feedback!
>
> Best,
> Wei
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Stateless+Function+for+Table
> [2]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-106%3A+Support+Python+UDF+in+SQL+Function+DDL
>
>


Re: [DISCUSS] Features of Apache Flink 1.11

2020-03-11 Thread godfrey he
Hi Zhijiang and Piotr,

I think we can remove "FLIP-91 Introduce SQL client gateway and provide
JDBC driver" from the list, because we have decided the first step to
support sql gateway and jdbc driver as ecosystem in ververica, we are not
going to put more effort on it now.

Thanks for updating the list!

Bests,
Godfrey


Timo Walther  于2020年3月11日周三 下午4:13写道:

> Hi Zhijiang and Piotr,
>
> from the SQL side we also plan to rework the source and sink interfaces
> in 1.11. The FLIP is not yet published but already reserved and
> requirement for FLIP-105:
>
> FLIP-95: New TableSource and TableSink interfaces
>
> Thanks for compiling the list!
>
> Regards,
> Timo
>
>
> On 11.03.20 09:05, Hequn Cheng wrote:
> > Thanks Zhijiang and Piotr for kicking off the discussion and providing
> the
> > detailed list.
> > This would be very helpful for tracking the features.
> >
> > BTW, as for PyFlink, it would be great if the feature list can also
> include
> > the following features:
> > - FLIP-112: Support User-Defined Metrics in Python UDF
> > - FLIP-114: Support Python UDF in SQL Client
> >
> > Looking forward to the release!
> >
> > Best,
> > Hequn
> >
> >
> >
> > On Wed, Mar 11, 2020 at 1:02 PM Yu Li  wrote:
> >
> >> Thanks for compiling the list of 1.11 efforts Zhijiang and Piotr! This
> >> helps a lot to better understand what the community is currently working
> >> on. Looking forward to another successful release.
> >>
> >> Best Regards,
> >> Yu
> >>
> >>
> >> On Wed, 11 Mar 2020 at 11:17, Zhijiang  >> .invalid>
> >> wrote:
> >>
> >>> Hi community,
> >>>
> >>>
> >>> Not more than one month ago we have released Flink 1.10. We are now
> >>> heading for the Flink 1.11 release and we, as release managers, would
> >> like
> >>> to share with you what are the features that the community is currently
> >>> working on and we are hoping that will be part of the Flink 1.11
> release.
> >>> Currently we are aiming with the feature freeze to happen in late
> April.
> >>>
> >>> As for now, some of the features are in the very early stages of the
> >>> development or even brainstorming. Because of that, some of them do not
> >>> have associated JIRA tickets or FLIP documents. For the next progress
> >>> announcement we are hoping that this will be no longer the case.
> >>>
> >>> Please also note that because we are still relatively at the beginning
> of
> >>> the release cycle, some of the FLIPs haven’t yet been voted.
> >>>
> >>> - SQL / Table
> >>> - FLIP-42: Restructure documentation [1]
> >>> - FLIP-65: New type inference for Table API UDFs [2]
> >>> - FLIP-84: Improve TableEnv’s interface [3]
> >>> - FLIP-91 Introduce SQL client gateway and provide JDBC driver [4]
> >>> - FLIP-93: Introduce JDBC catalog and Postgres catalog [5]
> >>> - FLIP-105: Support to interpret and emit changelog in Flink SQL [6]
> >>> - FLIP-107: Reading table columns from different parts of source
> records
> >>> [7]
> >>> - [FLINK-14807] Add Table#collect API for fetching data [8]
> >>> - Support query and table hints
> >>> - ML / Connectors
> >>> - FLIP-27: New source API [9]
> >>> - [FLINK-15670] Wrap a source/sink pair to persist intermediate result
> >> for
> >>> subgraph failure recovery [10]
> >>> - Pulsar source / sink / catalog
> >>> - Update ML Pipeline API interface to better support Flink ML lib
> >>> algorithms
> >>> - PyFlink
> >>> - FLIP-58: Debugging and monitoring of Python UDF [11]
> >>> - FLIP-106: Expand the usage scope of Python UDF [12]
> >>> - Integration with most popular Python libraries (Pandas)
> >>> - Performance improvements of Python UDF
> >>> - Support running python UDF in docker workers
> >>> - Add Python ML API
> >>> - Fully support all kinds of Python UDF
> >>> - Web UI
> >>> - FLIP-98: Better back pressure detection [13]
> >>> - FLIP-99: Make max exception configurable [14]
> >>> - FLIP-100: Add attempt information [15]
> >>> - FLIP-102: Add more metrics to TaskManager [16]
> >>> - FLIP-103: Better TM/JM log display [17]
> >>> - [FLINK-14816] Add thread dump feature for TaskManager [18]
> >>> - Runtime
> >>> - FLIP-56: Support for dynamic slots on the TaskExecutor [19]
> >>> - FLIP-67: Support for cluster partitions [20]
> >>> - FLIP-76: Unaligned checkpoints [21]
> >>> - FLIP-83: Flink e2e performance testing framework [22]
> >>> - FLIP-85: Support cluster deploy mode [23]
> >>> - FLIP-92: Add N-Ary input stream operator in Flink [24]
> >>> - FLIP-108: Add GPU to the resource management (specifically for UDTF &
> >>> UDF) [25]
> >>> - FLIP-111: Consolidate docker images [26]
> >>> - Unified memory configuration for JobManager
> >>> - Specify upper bound for number of allocated TaskManagers
> >>> - [FLINK-9407] ORC format for StreamingFileSink [27]
> >>> - [FLINK-10742] Let Netty use Flink's buffers on downstream side [28]
> >>> - [FLINK-10934] Support per-job mode for Kubernetes integration [29]
> >>> - [FLINK-11395] Avro writer for StreamingFileSink [30]
> >>> - [FLINK-11427] Protobuf parquet writer for Stream

[DISCUSS] FLIP-84 Feedback Summary

2020-03-25 Thread godfrey he
Hi community,
Timo, Fabian and Dawid have some feedbacks about FLIP-84[1]. The feedbacks
are all about new introduced methods. We had a discussion yesterday, and
most of feedbacks have been agreed upon. Here is the conclusions:

*1. about proposed methods in `TableEnvironment`:*

the original proposed methods:

TableEnvironment.createDmlBatch(): DmlBatch
TableEnvironment.executeStatement(String statement): ResultTable

the new proposed methods:

// we should not use abbreviations in the API, and the term "Batch" is
easily confused with batch/streaming processing
TableEnvironment.createStatementSet(): StatementSet

// every method that takes SQL should have `Sql` in its name
// supports multiline statement ???
TableEnvironment.executeSql(String statement): TableResult

// new methods. supports explaining DQL and DML
TableEnvironment.explainSql(String statement, ExplainDetail... details):
String


*2. about proposed related classes:*

the original proposed classes:

interface DmlBatch {
void addInsert(String insert);
void addInsert(String targetPath, Table table);
ResultTable execute() throws Exception ;
String explain(boolean extended);
}

public interface ResultTable {
TableSchema getResultSchema();
Iterable getResultRows();
}

the new proposed classes:

interface StatementSet {
// every method that takes SQL should have `Sql` in its name
// return StatementSet instance for fluent programming
addInsertSql(String statement): StatementSet

// return StatementSet instance for fluent programming
addInsert(String tablePath, Table table): StatementSet

// new method. support overwrite mode
addInsert(String tablePath, Table table, boolean overwrite):
StatementSet

explain(): String

// new method. supports adding more details for the result
explain(ExplainDetail... extraDetails): String

// throw exception ???
execute(): TableResult
}

interface TableResult {
getTableSchema(): TableSchema

// avoid custom parsing of an "OK" row in programming
getResultKind(): ResultKind

// instead of `get` make it explicit that this is might be triggering
an expensive operation
collect(): Iterable

// for fluent programming
print(): Unit
}

enum ResultKind {
SUCCESS, // for DDL, DCL and statements with a simple "OK"
SUCCESS_WITH_CONTENT, // rows with important content are available
(DML, DQL)
}


*3. new proposed methods in `Table`*

`Table.insertInto()` will be deprecated, and the following methods are
introduced:

Table.executeInsert(String tablePath): TableResult
Table.executeInsert(String tablePath, boolean overwrite): TableResult
Table.explain(ExplainDetail... details): String
Table.execute(): TableResult

There are two issues need further discussion, one is whether
`TableEnvironment.executeSql(String statement): TableResult` needs to
support multiline statement (or whether `TableEnvironment` needs to support
multiline statement), and another one is whether `StatementSet.execute()`
needs to throw exception.

please refer to the feedback document [2] for the details.

Any suggestions are warmly welcomed!

[1]
https://wiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
[2]
https://docs.google.com/document/d/1ueLjQWRPdLTFB_TReAyhseAX-1N3j4WYWD0F02Uau0E/edit

Best,
Godfrey


Re: [DISCUSS] FLIP-84 Feedback Summary

2020-03-25 Thread godfrey he
Hi Timo,

Thanks for the updating.

Regarding to "multiline statement support", I'm also fine that
`TableEnvironment.executeSql()` only supports single line statement, and we
can support multiline statement later (needs more discussion about this).

Regarding to "StatementSet.explian()", I don't have strong opinions about
that.

Regarding to "TableResult.getJobClient()", I think it's unnecessary. The
reason is: first, many statements (e.g. DDL, show xx, use xx)  will not
submit a Flink job. second, `TableEnvironment.executeSql()` and
`StatementSet.execute()` are synchronous method, `TableResult` will be
returned only after the job is finished or failed.

Regarding to "whether StatementSet.execute() needs to throw exception", I
think we should choose a unified way to tell whether the execution is
successful. If `TableResult` contains ERROR kind (non-runtime exception),
users need to not only check the result but also catch the runtime
exception in their code. or `StatementSet.execute()` does not throw any
exception (including runtime exception), all exception messages are in the
result.  I prefer "StatementSet.execute() needs to throw exception". cc @Jark
Wu 

I will update the agreed parts to the document first.

Best,
Godfrey


Timo Walther  于2020年3月25日周三 下午6:51写道:

> Hi Godfrey,
>
> thanks for starting the discussion on the mailing list. And sorry again
> for the late reply to FLIP-84. I have updated the Google doc one more
> time to incorporate the offline discussions.
>
>  From Dawid's and my view, it is fine to postpone the multiline support
> to a separate method. This can be future work even though we will need
> it rather soon.
>
> If there are no objections, I suggest to update the FLIP-84 again and
> have another voting process.
>
> Thanks,
> Timo
>
>
> On 25.03.20 11:17, godfrey he wrote:
> > Hi community,
> > Timo, Fabian and Dawid have some feedbacks about FLIP-84[1]. The
> feedbacks
> > are all about new introduced methods. We had a discussion yesterday, and
> > most of feedbacks have been agreed upon. Here is the conclusions:
> >
> > *1. about proposed methods in `TableEnvironment`:*
> >
> > the original proposed methods:
> >
> > TableEnvironment.createDmlBatch(): DmlBatch
> > TableEnvironment.executeStatement(String statement): ResultTable
> >
> > the new proposed methods:
> >
> > // we should not use abbreviations in the API, and the term "Batch" is
> > easily confused with batch/streaming processing
> > TableEnvironment.createStatementSet(): StatementSet
> >
> > // every method that takes SQL should have `Sql` in its name
> > // supports multiline statement ???
> > TableEnvironment.executeSql(String statement): TableResult
> >
> > // new methods. supports explaining DQL and DML
> > TableEnvironment.explainSql(String statement, ExplainDetail... details):
> > String
> >
> >
> > *2. about proposed related classes:*
> >
> > the original proposed classes:
> >
> > interface DmlBatch {
> >  void addInsert(String insert);
> >  void addInsert(String targetPath, Table table);
> >  ResultTable execute() throws Exception ;
> >  String explain(boolean extended);
> > }
> >
> > public interface ResultTable {
> >  TableSchema getResultSchema();
> >  Iterable getResultRows();
> > }
> >
> > the new proposed classes:
> >
> > interface StatementSet {
> >  // every method that takes SQL should have `Sql` in its name
> >  // return StatementSet instance for fluent programming
> >  addInsertSql(String statement): StatementSet
> >
> >  // return StatementSet instance for fluent programming
> >  addInsert(String tablePath, Table table): StatementSet
> >
> >  // new method. support overwrite mode
> >  addInsert(String tablePath, Table table, boolean overwrite):
> > StatementSet
> >
> >  explain(): String
> >
> >  // new method. supports adding more details for the result
> >  explain(ExplainDetail... extraDetails): String
> >
> >  // throw exception ???
> >  execute(): TableResult
> > }
> >
> > interface TableResult {
> >  getTableSchema(): TableSchema
> >
> >  // avoid custom parsing of an "OK" row in programming
> >  getResultKind(): ResultKind
> >
> >  // instead of `get` make it explicit that this is might be
> triggering
> > an expensive operation
> >  collect(): Iterable
> >
> >  // for fluent programming
>

Re: [DISCUSS] FLIP-84 Feedback Summary

2020-03-26 Thread godfrey he
Hi Timo,

I agree with you that streaming queries mostly need async execution.
In fact, our original plan is only introducing sync methods in this FLIP,
and async methods (like "executeSqlAsync") will be introduced in the future
which is mentioned in the appendix.

Maybe the async methods also need to be considered in this FLIP.

I think sync methods is also useful for streaming which can be used to run
bounded source.
Maybe we should check whether all sources are bounded in sync execution
mode.

>Also, if we block for streaming queries, we could never support
> multiline files. Because the first INSERT INTO would block the further
> execution.
agree with you, we need async method to submit multiline files,
and files should be limited that the DQL and DML should be always in the
end for streaming.

Best,
Godfrey

Timo Walther  于2020年3月26日周四 下午4:29写道:

> Hi Godfrey,
>
> having control over the job after submission is a requirement that was
> requested frequently (some examples are [1], [2]). Users would like to
> get insights about the running or completed job. Including the jobId,
> jobGraph etc., the JobClient summarizes these properties.
>
> It is good to have a discussion about synchronous/asynchronous
> submission now to have a complete execution picture.
>
> I thought we submit streaming queries mostly async and just wait for the
> successful submission. If we block for streaming queries, how can we
> collect() or print() results?
>
> Also, if we block for streaming queries, we could never support
> multiline files. Because the first INSERT INTO would block the further
> execution.
>
> If we decide to block entirely on streaming queries, we need the async
> execution methods in the design already. However, I would rather go for
> non-blocking streaming queries. Also with the `EMIT STREAM` key word in
> mind that we might add to SQL statements soon.
>
> Regards,
> Timo
>
> [1] https://issues.apache.org/jira/browse/FLINK-16761
> [2] https://issues.apache.org/jira/browse/FLINK-12214
>
> On 25.03.20 16:30, godfrey he wrote:
> > Hi Timo,
> >
> > Thanks for the updating.
> >
> > Regarding to "multiline statement support", I'm also fine that
> > `TableEnvironment.executeSql()` only supports single line statement, and
> we
> > can support multiline statement later (needs more discussion about this).
> >
> > Regarding to "StatementSet.explian()", I don't have strong opinions about
> > that.
> >
> > Regarding to "TableResult.getJobClient()", I think it's unnecessary. The
> > reason is: first, many statements (e.g. DDL, show xx, use xx)  will not
> > submit a Flink job. second, `TableEnvironment.executeSql()` and
> > `StatementSet.execute()` are synchronous method, `TableResult` will be
> > returned only after the job is finished or failed.
> >
> > Regarding to "whether StatementSet.execute() needs to throw exception", I
> > think we should choose a unified way to tell whether the execution is
> > successful. If `TableResult` contains ERROR kind (non-runtime exception),
> > users need to not only check the result but also catch the runtime
> > exception in their code. or `StatementSet.execute()` does not throw any
> > exception (including runtime exception), all exception messages are in
> the
> > result.  I prefer "StatementSet.execute() needs to throw exception". cc
> @Jark
> > Wu 
> >
> > I will update the agreed parts to the document first.
> >
> > Best,
> > Godfrey
> >
> >
> > Timo Walther  于2020年3月25日周三 下午6:51写道:
> >
> >> Hi Godfrey,
> >>
> >> thanks for starting the discussion on the mailing list. And sorry again
> >> for the late reply to FLIP-84. I have updated the Google doc one more
> >> time to incorporate the offline discussions.
> >>
> >>   From Dawid's and my view, it is fine to postpone the multiline support
> >> to a separate method. This can be future work even though we will need
> >> it rather soon.
> >>
> >> If there are no objections, I suggest to update the FLIP-84 again and
> >> have another voting process.
> >>
> >> Thanks,
> >> Timo
> >>
> >>
> >> On 25.03.20 11:17, godfrey he wrote:
> >>> Hi community,
> >>> Timo, Fabian and Dawid have some feedbacks about FLIP-84[1]. The
> >> feedbacks
> >>> are all about new introduced methods. We had a discussion yesterday,
> and
> >>> most of feedbacks have been agreed upon. Here is the conclusions:
> >>>
> >>

Re: [DISCUSS] FLIP-84 Feedback Summary

2020-03-30 Thread godfrey he
Hi Timo,

Agree with you that streaming queries is our top priority,
but I think there are too many things need to discuss for multiline
statements:
e.g.
1. what's the behaivor of DDL and DML mixing for async execution:
create table t1 xxx;
create table t2 xxx;
insert into t2 select * from t1 where xxx;
drop table t1; // t1 may be a MySQL table, the data will also be deleted.

t1 is dropped when "insert" job is running.

2. what's the behaivor of unified scenario for async execution: (as you
mentioned)
INSERT INTO t1 SELECT * FROM s;
INSERT INTO t2 SELECT * FROM s JOIN t1 EMIT STREAM;

The result of the second statement is indeterministic, because the first
statement maybe is running.
I think we need to put a lot of effort to define the behavior of logically
related queries.

In this FLIP, I suggest we only handle single statement, and we also
introduce an async execute method
which is more important and more often used for users.

Dor the sync methods (like `TableEnvironment.executeSql` and
`StatementSet.execute`),
the result will be returned until the job is finished. The following
methods will be introduced in this FLIP:

 /**
  * Asynchronously execute the given single statement
  */
TableEnvironment.executeSqlAsync(String statement): TableResult

/**
 * Asynchronously execute the dml statements as a batch
 */
StatementSet.executeAsync(): TableResult

public interface TableResult {
   /**
* return JobClient for DQL and DML in async mode, else return
Optional.empty
*/
   Optional getJobClient();
}

what do you think?

Best,
Godfrey

Timo Walther  于2020年3月26日周四 下午9:15写道:

> Hi Godfrey,
>
> executing streaming queries must be our top priority because this is
> what distinguishes Flink from competitors. If we change the execution
> behavior, we should think about the other cases as well to not break the
> API a third time.
>
> I fear that just having an async execute method will not be enough
> because users should be able to mix streaming and batch queries in a
> unified scenario.
>
> If I remember it correctly, we had some discussions in the past about
> what decides about the execution mode of a query. Currently, we would
> like to let the query decide, not derive it from the sources.
>
> So I could image a multiline pipeline as:
>
> USE CATALOG 'mycat';
> INSERT INTO t1 SELECT * FROM s;
> INSERT INTO t2 SELECT * FROM s JOIN t1 EMIT STREAM;
>
> For executeMultilineSql():
>
> sync because regular SQL
> sync because regular Batch SQL
> async because Streaming SQL
>
> For executeAsyncMultilineSql():
>
> async because everything should be async
> async because everything should be async
> async because everything should be async
>
> What we should not start for executeAsyncMultilineSql():
>
> sync because DDL
> async because everything should be async
> async because everything should be async
>
> What are you thoughts here?
>
> Regards,
> Timo
>
>
> On 26.03.20 12:50, godfrey he wrote:
> > Hi Timo,
> >
> > I agree with you that streaming queries mostly need async execution.
> > In fact, our original plan is only introducing sync methods in this FLIP,
> > and async methods (like "executeSqlAsync") will be introduced in the
> future
> > which is mentioned in the appendix.
> >
> > Maybe the async methods also need to be considered in this FLIP.
> >
> > I think sync methods is also useful for streaming which can be used to
> run
> > bounded source.
> > Maybe we should check whether all sources are bounded in sync execution
> > mode.
> >
> >> Also, if we block for streaming queries, we could never support
> >> multiline files. Because the first INSERT INTO would block the further
> >> execution.
> > agree with you, we need async method to submit multiline files,
> > and files should be limited that the DQL and DML should be always in the
> > end for streaming.
> >
> > Best,
> > Godfrey
> >
> > Timo Walther  于2020年3月26日周四 下午4:29写道:
> >
> >> Hi Godfrey,
> >>
> >> having control over the job after submission is a requirement that was
> >> requested frequently (some examples are [1], [2]). Users would like to
> >> get insights about the running or completed job. Including the jobId,
> >> jobGraph etc., the JobClient summarizes these properties.
> >>
> >> It is good to have a discussion about synchronous/asynchronous
> >> submission now to have a complete execution picture.
> >>
> >> I thought we submit streaming queries mostly async and just wait for the
> >> successful submission. If we block for streaming queries, how can we
> >> collect() or pr

Re: [DISCUSS] FLIP-84 Feedback Summary

2020-03-30 Thread godfrey he
Hi, Timo & Jark

Thanks for your explanation.
Agree with you that async execution should always be async,
and sync execution scenario can be covered  by async execution.
It helps provide an unified entry point for batch and streaming.
I think we can also use sync execution for some testing.
So, I agree with you that we provide `executeSql` method and it's async
method.
If we want sync method in the future, we can add method named
`executeSqlSync`.

I think we've reached an agreement. I will update the document, and start
voting process.

Best,
Godfrey


Jark Wu  于2020年3月31日周二 上午12:46写道:

> Hi,
>
> I didn't follow the full discussion.
> But I share the same concern with Timo that streaming queries should always
> be async.
> Otherwise, I can image it will cause a lot of confusion and problems if
> users don't deeply keep the "sync" in mind (e.g. client hangs).
> Besides, the streaming mode is still the majority use cases of Flink and
> Flink SQL. We should put the usability at a high priority.
>
> Best,
> Jark
>
>
> On Mon, 30 Mar 2020 at 23:27, Timo Walther  wrote:
>
> > Hi Godfrey,
> >
> > maybe I wasn't expressing my biggest concern enough in my last mail.
> > Even in a singleline and sync execution, I think that streaming queries
> > should not block the execution. Otherwise it is not possible to call
> > collect() or print() on them afterwards.
> >
> > "there are too many things need to discuss for multiline":
> >
> > True, I don't want to solve all of them right now. But what I know is
> > that our newly introduced methods should fit into a multiline execution.
> > There is no big difference of calling `executeSql(A), executeSql(B)` and
> > processing a multiline file `A;\nB;`.
> >
> > I think the example that you mentioned can simply be undefined for now.
> > Currently, no catalog is modifying data but just metadata. This is a
> > separate discussion.
> >
> > "result of the second statement is indeterministic":
> >
> > Sure this is indeterministic. But this is the implementers fault and we
> > cannot forbid such pipelines.
> >
> > How about we always execute streaming queries async? It would unblock
> > executeSql() and multiline statements.
> >
> > Having a `executeSqlAsync()` is useful for batch. However, I don't want
> > `sync/async` be the new batch/stream flag. The execution behavior should
> > come from the query itself.
> >
> > Regards,
> > Timo
> >
> >
> > On 30.03.20 11:12, godfrey he wrote:
> > > Hi Timo,
> > >
> > > Agree with you that streaming queries is our top priority,
> > > but I think there are too many things need to discuss for multiline
> > > statements:
> > > e.g.
> > > 1. what's the behaivor of DDL and DML mixing for async execution:
> > > create table t1 xxx;
> > > create table t2 xxx;
> > > insert into t2 select * from t1 where xxx;
> > > drop table t1; // t1 may be a MySQL table, the data will also be
> deleted.
> > >
> > > t1 is dropped when "insert" job is running.
> > >
> > > 2. what's the behaivor of unified scenario for async execution: (as you
> > > mentioned)
> > > INSERT INTO t1 SELECT * FROM s;
> > > INSERT INTO t2 SELECT * FROM s JOIN t1 EMIT STREAM;
> > >
> > > The result of the second statement is indeterministic, because the
> first
> > > statement maybe is running.
> > > I think we need to put a lot of effort to define the behavior of
> > logically
> > > related queries.
> > >
> > > In this FLIP, I suggest we only handle single statement, and we also
> > > introduce an async execute method
> > > which is more important and more often used for users.
> > >
> > > Dor the sync methods (like `TableEnvironment.executeSql` and
> > > `StatementSet.execute`),
> > > the result will be returned until the job is finished. The following
> > > methods will be introduced in this FLIP:
> > >
> > >   /**
> > >* Asynchronously execute the given single statement
> > >*/
> > > TableEnvironment.executeSqlAsync(String statement): TableResult
> > >
> > > /**
> > >   * Asynchronously execute the dml statements as a batch
> > >   */
> > > StatementSet.executeAsync(): TableResult
> > >
> > > public interface TableResult {
> > > /**
> > >  * return JobClient for DQL and DML in async mode, else return
> 

[VOTE] FLIP-84: Improve & Refactor API of TableEnvironment & Table

2020-03-30 Thread godfrey he
Hi everyone,

I'd like to start the vote of FLIP-84[1] again, because we have some
feedbacks. The feedbacks are all about new introduced methods, here is the
discussion thread [2].

The vote will be open for at least 72 hours. Unless there is an objection,
I will try to close it by Apr 3, 2020 06:30 UTC if we have received
sufficient votes.


[1]
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878

[2]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-84-Feedback-Summary-td39261.html


Bests,
Godfrey


Re: [VOTE] FLIP-84: Improve & Refactor API of TableEnvironment & Table

2020-03-31 Thread godfrey he
Hi, Timo

So sorry about that, I'm in a little hurry. Let's wait for 24h.

Best,
Godfrey

Timo Walther  于2020年3月31日周二 下午5:26写道:

> -1
>
> The current discussion has not completed. The last comments were sent
> less than 24h ago.
>
> Let's wait a bit longer to collect feedback from all stakeholders.
>
> Thanks,
> Timo
>
> On 31.03.20 08:31, godfrey he wrote:
> > Hi everyone,
> >
> > I'd like to start the vote of FLIP-84[1] again, because we have some
> > feedbacks. The feedbacks are all about new introduced methods, here is
> the
> > discussion thread [2].
> >
> > The vote will be open for at least 72 hours. Unless there is an
> objection,
> > I will try to close it by Apr 3, 2020 06:30 UTC if we have received
> > sufficient votes.
> >
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
> >
> > [2]
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-84-Feedback-Summary-td39261.html
> >
> >
> > Bests,
> > Godfrey
> >
>
>


Re: [DISCUSS] FLIP-84 Feedback Summary

2020-04-01 Thread godfrey he
LECT * FROM ...`, the query execution might block until
> > `collect()` is called to pull buffered rows from the job (from
> > socket/REST API what ever we will use in the future). We can say that
> > a statement finished successfully, when the `collect#Iterator#hasNext`
> > has returned false.
> >
> > I hope this summarizes our discussion @Dawid/Aljoscha/Klou?
> >
> > It would be great if we can add these findings to the FLIP before we
> > start voting.
> >
> > One minor thing: some `execute()` methods still throw a checked
> > exception; can we remove that from the FLIP? Also the above mentioned
> > `Iterator#next()` would trigger an execution without throwing a
> > checked exception.
> >
> > Thanks,
> > Timo
> >
> > [1]
> >
> https://docs.google.com/document/d/1ueLjQWRPdLTFB_TReAyhseAX-1N3j4WYWD0F02Uau0E/edit#
> >
> > On 31.03.20 06:28, godfrey he wrote:
> >> Hi, Timo & Jark
> >>
> >> Thanks for your explanation.
> >> Agree with you that async execution should always be async,
> >> and sync execution scenario can be covered  by async execution.
> >> It helps provide an unified entry point for batch and streaming.
> >> I think we can also use sync execution for some testing.
> >> So, I agree with you that we provide `executeSql` method and it's async
> >> method.
> >> If we want sync method in the future, we can add method named
> >> `executeSqlSync`.
> >>
> >> I think we've reached an agreement. I will update the document, and
> >> start
> >> voting process.
> >>
> >> Best,
> >> Godfrey
> >>
> >>
> >> Jark Wu  于2020年3月31日周二 上午12:46写道:
> >>
> >>> Hi,
> >>>
> >>> I didn't follow the full discussion.
> >>> But I share the same concern with Timo that streaming queries should
> >>> always
> >>> be async.
> >>> Otherwise, I can image it will cause a lot of confusion and problems if
> >>> users don't deeply keep the "sync" in mind (e.g. client hangs).
> >>> Besides, the streaming mode is still the majority use cases of Flink
> >>> and
> >>> Flink SQL. We should put the usability at a high priority.
> >>>
> >>> Best,
> >>> Jark
> >>>
> >>>
> >>> On Mon, 30 Mar 2020 at 23:27, Timo Walther  wrote:
> >>>
> >>>> Hi Godfrey,
> >>>>
> >>>> maybe I wasn't expressing my biggest concern enough in my last mail.
> >>>> Even in a singleline and sync execution, I think that streaming
> >>>> queries
> >>>> should not block the execution. Otherwise it is not possible to call
> >>>> collect() or print() on them afterwards.
> >>>>
> >>>> "there are too many things need to discuss for multiline":
> >>>>
> >>>> True, I don't want to solve all of them right now. But what I know is
> >>>> that our newly introduced methods should fit into a multiline
> >>>> execution.
> >>>> There is no big difference of calling `executeSql(A),
> >>>> executeSql(B)` and
> >>>> processing a multiline file `A;\nB;`.
> >>>>
> >>>> I think the example that you mentioned can simply be undefined for
> >>>> now.
> >>>> Currently, no catalog is modifying data but just metadata. This is a
> >>>> separate discussion.
> >>>>
> >>>> "result of the second statement is indeterministic":
> >>>>
> >>>> Sure this is indeterministic. But this is the implementers fault
> >>>> and we
> >>>> cannot forbid such pipelines.
> >>>>
> >>>> How about we always execute streaming queries async? It would unblock
> >>>> executeSql() and multiline statements.
> >>>>
> >>>> Having a `executeSqlAsync()` is useful for batch. However, I don't
> >>>> want
> >>>> `sync/async` be the new batch/stream flag. The execution behavior
> >>>> should
> >>>> come from the query itself.
> >>>>
> >>>> Regards,
> >>>> Timo
> >>>>
> >>>>
> >>>> On 30.03.20 11:12, godfrey he wrote:
> >>>>> Hi Timo,
> >>>>>
> >>>>> Agree w

Re: [DISCUSS] FLIP-84 Feedback Summary

2020-04-01 Thread godfrey he
Hi Timo,

Regarding to "`execute` method throws checked exception",
 is that mean we should convert the checked exception to unchecked
exception
or we need add ERROR type in ResultKind.

for the second approach, I still think it's not convenient
for the user to check exception when calling `collect` method and `print`
method.
the code looks like:

// add `getError()` method in TableResult and store the exception
in TableResult independent
TableResult result = tEnv.executeSql("select xxx");
if (result.getResultKind() == ResultKind.ERROR) {
  print result.getError();
} else {
  Iterator it =  result.collect();
  it...
}

 // treat the exception as a kind of result, and get exception through
`collect` method
TableResult result = tEnv.executeSql("select xxx");
if (result.getResultKind() == ResultKind.ERROR) {
   Iterator it =  result.collect();
   Row row = it.next();
   print row.getField(0);
} else {
  Iterator it =  result.collect();
  it...
}

// for fluent programming
Iterator it = tEnv.executeSql("select xxx").collect();
it...

Best,
Godfrey

Timo Walther  于2020年4月1日周三 上午1:27写道:

> Hi Godfrey,
>
> Aljoscha, Dawid, Klou, and I had another discussion around FLIP-84. In
> particular, we discussed how the current status of the FLIP and the
> future requirements around multiline statements, async/sync, collect()
> fit together.
>
> We also updated the FLIP-84 Feedback Summary document [1] with some use
> cases.
>
> We believe that we found a good solution that also fits to what is in
> the current FLIP. So no bigger changes necessary, which is great!
>
> Our findings were:
>
> 1. Async vs sync submission of Flink jobs:
>
> Having a blocking `execute()` in DataStream API was rather a mistake.
> Instead all submissions should be async because this allows supporting
> both modes if necessary. Thus, submitting all queries async sounds good
> to us. If users want to run a job sync, they can use the JobClient and
> wait for completion (or collect() in case of batch jobs).
>
> 2. Multi-statement execution:
>
> For the multi-statement execution, we don't see a contradication with
> the async execution behavior. We imagine a method like:
>
> TableEnvironment#executeMultilineSql(String statements):
> Iterable
>
> Where the `Iterator#next()` method would trigger the next statement
> submission. This allows a caller to decide synchronously when to submit
> statements async to the cluster. Thus, a service such as the SQL Client
> can handle the result of each statement individually and process
> statement by statement sequentially.
>
> 3. The role of TableResult and result retrieval in general
>
> `TableResult` is similar to `JobClient`. Instead of returning a
> `CompletableFuture` of something, it is a concrete util class where some
> methods have the behavior of completable future (e.g. collect(),
> print()) and some are already completed (getTableSchema(),
> getResultKind()).
>
> `StatementSet#execute()` returns a single `TableResult` because the
> order is undefined in a set and all statements have the same schema. Its
> `collect()` will return a row for each executed `INSERT INTO` in the
> order of statement definition.
>
> For simple `SELECT * FROM ...`, the query execution might block until
> `collect()` is called to pull buffered rows from the job (from
> socket/REST API what ever we will use in the future). We can say that a
> statement finished successfully, when the `collect#Iterator#hasNext` has
> returned false.
>
> I hope this summarizes our discussion @Dawid/Aljoscha/Klou?
>
> It would be great if we can add these findings to the FLIP before we
> start voting.
>
> One minor thing: some `execute()` methods still throw a checked
> exception; can we remove that from the FLIP? Also the above mentioned
> `Iterator#next()` would trigger an execution without throwing a checked
> exception.
>
> Thanks,
> Timo
>
> [1]
>
> https://docs.google.com/document/d/1ueLjQWRPdLTFB_TReAyhseAX-1N3j4WYWD0F02Uau0E/edit#
>
> On 31.03.20 06:28, godfrey he wrote:
> > Hi, Timo & Jark
> >
> > Thanks for your explanation.
> > Agree with you that async execution should always be async,
> > and sync execution scenario can be covered  by async execution.
> > It helps provide an unified entry point for batch and streaming.
> > I think we can also use sync execution for some testing.
> > So, I agree with you that we provide `executeSql` method and it's async
> > method.
> > If we want sync method in the future, we can add method named
> > `executeSqlSync`.
> >
> > I think we've reached an agreement. I will update the document, and start
> > voting process.
> >
> >

Re: [ANNOUNCE] New Committers and PMC member

2020-04-01 Thread godfrey he
Congratulations to all of you~

Best,
Godfrey

Ismaël Mejía  于2020年4月2日周四 上午6:42写道:

> Congrats everyone!
>
> On Thu, Apr 2, 2020 at 12:16 AM Rong Rong  wrote:
> >
> > Congratulations to all!!!
> >
> > --
> > Rong
> >
> > On Wed, Apr 1, 2020 at 2:27 PM Thomas Weise  wrote:
> >
> > > Congratulations!
> > >
> > >
> > > On Wed, Apr 1, 2020 at 9:31 AM Fabian Hueske 
> wrote:
> > >
> > > > Congrats everyone!
> > > >
> > > > Cheers, Fabian
> > > >
> > > > Am Mi., 1. Apr. 2020 um 18:26 Uhr schrieb Yun Tang  >:
> > > >
> > > > > Congratulations to all of you!
> > > > >
> > > > > Best
> > > > > Yun Tang
> > > > > 
> > > > > From: Yang Wang 
> > > > > Sent: Wednesday, April 1, 2020 22:28
> > > > > To: dev 
> > > > > Subject: Re: [ANNOUNCE] New Committers and PMC member
> > > > >
> > > > > Congratulations all.
> > > > >
> > > > > Best,
> > > > > Yang
> > > > >
> > > > > Leonard Xu  于2020年4月1日周三 下午10:15写道:
> > > > >
> > > > > > Congratulations Konstantin, Dawid and Zhijiang!  Well deserved!
> > > > > >
> > > > > > Best,
> > > > > > Leonard Xu
> > > > > > > 在 2020年4月1日,21:22,Jark Wu  写道:
> > > > > > >
> > > > > > > Congratulations to you all!
> > > > > > >
> > > > > > > Best,
> > > > > > > Jark
> > > > > > >
> > > > > > > On Wed, 1 Apr 2020 at 20:33, Kurt Young 
> wrote:
> > > > > > >
> > > > > > >> Congratulations to you all!
> > > > > > >>
> > > > > > >> Best,
> > > > > > >> Kurt
> > > > > > >>
> > > > > > >>
> > > > > > >> On Wed, Apr 1, 2020 at 7:41 PM Danny Chan <
> yuzhao@gmail.com>
> > > > > wrote:
> > > > > > >>
> > > > > > >>> Congratulations!
> > > > > > >>>
> > > > > > >>> Best,
> > > > > > >>> Danny Chan
> > > > > > >>> 在 2020年4月1日 +0800 PM7:36,dev@flink.apache.org,写道:
> > > > > > 
> > > > > >  Congratulations!
> > > > > > >>>
> > > > > > >>
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
>


Re: [DISCUSS] FLIP-84 Feedback Summary

2020-04-01 Thread godfrey he
Hi Aljoscha, Dawid, Timo,

Thanks so much for the detailed explanation.
Agree with you that the multiline story is not completed now, and we can
keep discussion.
I will add current discussions and conclusions to the FLIP.

Best,
Godfrey



Timo Walther  于2020年4月1日周三 下午11:27写道:

> Hi Godfrey,
>
> first of all, I agree with Dawid. The multiline story is not completed
> by this FLIP. It just verifies the big picture.
>
> 1. "control the execution logic through the proposed method if they know
> what the statements are"
>
> This is a good point that also Fabian raised in the linked google doc. I
> could also imagine to return a more complicated POJO when calling
> `executeMultiSql()`.
>
> The POJO would include some `getSqlProperties()` such that a platform
> gets insights into the query before executing. We could also trigger the
> execution more explicitly instead of hiding it behind an iterator.
>
> 2. "there are some special commands introduced in SQL client"
>
> For platforms and SQL Client specific commands, we could offer a hook to
> the parser or a fallback parser in case the regular table environment
> parser cannot deal with the statement.
>
> However, all of that is future work and can be discussed in a separate
> FLIP.
>
> 3. +1 for the `Iterator` instead of `Iterable`.
>
> 4. "we should convert the checked exception to unchecked exception"
>
> Yes, I meant using a runtime exception instead of a checked exception.
> There was no consensus on putting the exception into the `TableResult`.
>
> Regards,
> Timo
>
> On 01.04.20 15:35, Dawid Wysakowicz wrote:
> > When considering the multi-line support I think it is helpful to start
> > with a use case in mind. In my opinion consumers of this method will be:
> >
> >  1. sql-client
> >  2. third-part sql based platforms
> >
> > @Godfrey As for the quit/source/... commands. I think those belong to
> > the responsibility of aforementioned. I think they should not be
> > understandable by the TableEnvironment. What would quit on a
> > TableEnvironment do? Moreover I think such commands should be prefixed
> > appropriately. I think it's a common practice to e.g. prefix those with
> > ! or : to say they are meta commands of the tool rather than a query.
> >
> > I also don't necessarily understand why platform users need to know the
> > kind of the query to use the proposed method. They should get the type
> > from the TableResult#ResultKind. If the ResultKind is SUCCESS, it was a
> > DCL/DDL. If SUCCESS_WITH_CONTENT it was a DML/DQL. If that's not enough
> > we can enrich the TableResult with more explicit kind of query, but so
> > far I don't see such a need.
> >
> > @Kurt In those cases I would assume the developers want to present
> > results of the queries anyway. Moreover I think it is safe to assume
> > they can adhere to such a contract that the results must be iterated.
> >
> > For direct users of TableEnvironment/Table API this method does not make
> > much sense anyway, in my opinion. I think we can rather safely assume in
> > this scenario they do not want to submit multiple queries at a single
> time.
> >
> > Best,
> >
> > Dawid
> >
> >
> > On 01/04/2020 15:07, Kurt Young wrote:
> >> One comment to `executeMultilineSql`, I'm afraid sometimes user might
> >> forget to
> >> iterate the returned iterators, e.g. user submits a bunch of DDLs and
> >> expect the
> >> framework will execute them one by one. But it didn't.
> >>
> >> Best,
> >> Kurt
> >>
> >>
> >> On Wed, Apr 1, 2020 at 5:10 PM Aljoscha Krettek
> wrote:
> >>
> >>> Agreed to what Dawid and Timo said.
> >>>
> >>> To answer your question about multi line SQL: no, we don't think we
> need
> >>> this in Flink 1.11, we only wanted to make sure that the interfaces
> that
> >>> we now put in place will potentially allow this in the future.
> >>>
> >>> Best,
> >>> Aljoscha
> >>>
> >>> On 01.04.20 09:31, godfrey he wrote:
> >>>> Hi, Timo & Dawid,
> >>>>
> >>>> Thanks so much for the effort of `multiline statements supporting`,
> >>>> I have a few questions about this method:
> >>>>
> >>>> 1. users can well control the execution logic through the proposed
> method
> >>>>if they know what the statements are (a statement is a DDL, a DML
> or
> >>>> others).
&g

Re: [VOTE] FLIP-84: Improve & Refactor API of TableEnvironment & Table

2020-04-03 Thread godfrey he
Hi everyone,

I'd like to start the vote of FLIP-84[1] again, which is discussed and
reached consensus in the discussion thread[2].

The vote will be open for at least 72 hours. Unless there is an objection,
I will try to close it by Apr 6, 2020 13:10 UTC if we have received
sufficient votes.


[1]
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878

[2]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-84-Feedback-Summary-td39261.html


Bests,
Godfrey

godfrey he  于2020年3月31日周二 下午8:42写道:

> Hi, Timo
>
> So sorry about that, I'm in a little hurry. Let's wait for 24h.
>
> Best,
> Godfrey
>
> Timo Walther  于2020年3月31日周二 下午5:26写道:
>
>> -1
>>
>> The current discussion has not completed. The last comments were sent
>> less than 24h ago.
>>
>> Let's wait a bit longer to collect feedback from all stakeholders.
>>
>> Thanks,
>> Timo
>>
>> On 31.03.20 08:31, godfrey he wrote:
>> > Hi everyone,
>> >
>> > I'd like to start the vote of FLIP-84[1] again, because we have some
>> > feedbacks. The feedbacks are all about new introduced methods, here is
>> the
>> > discussion thread [2].
>> >
>> > The vote will be open for at least 72 hours. Unless there is an
>> objection,
>> > I will try to close it by Apr 3, 2020 06:30 UTC if we have received
>> > sufficient votes.
>> >
>> >
>> > [1]
>> >
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
>> >
>> > [2]
>> >
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-84-Feedback-Summary-td39261.html
>> >
>> >
>> > Bests,
>> > Godfrey
>> >
>>
>>


Re: [DISCUSS] FLIP-84 Feedback Summary

2020-04-06 Thread godfrey he
Hi Timo,

Sorry for the late reply, and thanks for your correction.
I missed DQL for job submission scenario.
I'll fix the document right away.

Best,
Godfrey

Timo Walther  于2020年4月3日周五 下午9:53写道:

> Hi Godfrey,
>
> I'm sorry to jump in again but I still need to clarify some things
> around TableResult.
>
> The FLIP says:
> "For DML, this method returns TableResult until the job is submitted.
> For other statements, TableResult is returned until the execution is
> finished."
>
> I thought we agreed on making every execution async? This also means
> returning a TableResult for DQLs even though the execution is not done
> yet. People need access to the JobClient also for batch jobs in order to
> cancel long lasting queries. If people want to wait for the completion
> they can hook into JobClient or collect().
>
> Can we rephrase this part to:
>
> The FLIP says:
> "For DML and DQL, this method returns TableResult once the job has been
> submitted. For DDL and DCL statements, TableResult is returned once the
> operation has finished."
>
> Regards,
> Timo
>
>
> On 02.04.20 05:27, godfrey he wrote:
> > Hi Aljoscha, Dawid, Timo,
> >
> > Thanks so much for the detailed explanation.
> > Agree with you that the multiline story is not completed now, and we can
> > keep discussion.
> > I will add current discussions and conclusions to the FLIP.
> >
> > Best,
> > Godfrey
> >
> >
> >
> > Timo Walther  于2020年4月1日周三 下午11:27写道:
> >
> >> Hi Godfrey,
> >>
> >> first of all, I agree with Dawid. The multiline story is not completed
> >> by this FLIP. It just verifies the big picture.
> >>
> >> 1. "control the execution logic through the proposed method if they know
> >> what the statements are"
> >>
> >> This is a good point that also Fabian raised in the linked google doc. I
> >> could also imagine to return a more complicated POJO when calling
> >> `executeMultiSql()`.
> >>
> >> The POJO would include some `getSqlProperties()` such that a platform
> >> gets insights into the query before executing. We could also trigger the
> >> execution more explicitly instead of hiding it behind an iterator.
> >>
> >> 2. "there are some special commands introduced in SQL client"
> >>
> >> For platforms and SQL Client specific commands, we could offer a hook to
> >> the parser or a fallback parser in case the regular table environment
> >> parser cannot deal with the statement.
> >>
> >> However, all of that is future work and can be discussed in a separate
> >> FLIP.
> >>
> >> 3. +1 for the `Iterator` instead of `Iterable`.
> >>
> >> 4. "we should convert the checked exception to unchecked exception"
> >>
> >> Yes, I meant using a runtime exception instead of a checked exception.
> >> There was no consensus on putting the exception into the `TableResult`.
> >>
> >> Regards,
> >> Timo
> >>
> >> On 01.04.20 15:35, Dawid Wysakowicz wrote:
> >>> When considering the multi-line support I think it is helpful to start
> >>> with a use case in mind. In my opinion consumers of this method will
> be:
> >>>
> >>>   1. sql-client
> >>>   2. third-part sql based platforms
> >>>
> >>> @Godfrey As for the quit/source/... commands. I think those belong to
> >>> the responsibility of aforementioned. I think they should not be
> >>> understandable by the TableEnvironment. What would quit on a
> >>> TableEnvironment do? Moreover I think such commands should be prefixed
> >>> appropriately. I think it's a common practice to e.g. prefix those with
> >>> ! or : to say they are meta commands of the tool rather than a query.
> >>>
> >>> I also don't necessarily understand why platform users need to know the
> >>> kind of the query to use the proposed method. They should get the type
> >>> from the TableResult#ResultKind. If the ResultKind is SUCCESS, it was a
> >>> DCL/DDL. If SUCCESS_WITH_CONTENT it was a DML/DQL. If that's not enough
> >>> we can enrich the TableResult with more explicit kind of query, but so
> >>> far I don't see such a need.
> >>>
> >>> @Kurt In those cases I would assume the developers want to present
> >>> results of the queries anyway. Moreover I think it is safe to assume
> >>> they can adhere to su

Re: [VOTE] FLIP-84: Improve & Refactor API of TableEnvironment & Table

2020-04-06 Thread godfrey he
Hi Timo,

Sorry for late reply, and thanks for your correction. I have fixed the typo
and updated the document.

Best,
Godfrey

Timo Walther  于2020年4月6日周一 下午6:05写道:

> Hi Godfrey,
>
> did you see my remaining feedback in the discussion thread? We could
> finish this FLIP if this gets resolved.
>
> Thanks,
> Timo
>
> On 03.04.20 15:12, Terry Wang wrote:
> > +1 (non-binding)
> > Looks great to me, Thanks for driving on this.
> >
> > Best,
> > Terry Wang
> >
> >
> >
> >> 2020年4月3日 21:07,godfrey he  写道:
> >>
> >> Hi everyone,
> >>
> >> I'd like to start the vote of FLIP-84[1] again, which is discussed and
> >> reached consensus in the discussion thread[2].
> >>
> >> The vote will be open for at least 72 hours. Unless there is an
> objection,
> >> I will try to close it by Apr 6, 2020 13:10 UTC if we have received
> >> sufficient votes.
> >>
> >>
> >> [1]
> >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
> >>
> >> [2]
> >>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-84-Feedback-Summary-td39261.html
> >>
> >>
> >> Bests,
> >> Godfrey
> >>
> >> godfrey he  于2020年3月31日周二 下午8:42写道:
> >>
> >>> Hi, Timo
> >>>
> >>> So sorry about that, I'm in a little hurry. Let's wait for 24h.
> >>>
> >>> Best,
> >>> Godfrey
> >>>
> >>> Timo Walther  于2020年3月31日周二 下午5:26写道:
> >>>
> >>>> -1
> >>>>
> >>>> The current discussion has not completed. The last comments were sent
> >>>> less than 24h ago.
> >>>>
> >>>> Let's wait a bit longer to collect feedback from all stakeholders.
> >>>>
> >>>> Thanks,
> >>>> Timo
> >>>>
> >>>> On 31.03.20 08:31, godfrey he wrote:
> >>>>> Hi everyone,
> >>>>>
> >>>>> I'd like to start the vote of FLIP-84[1] again, because we have some
> >>>>> feedbacks. The feedbacks are all about new introduced methods, here
> is
> >>>> the
> >>>>> discussion thread [2].
> >>>>>
> >>>>> The vote will be open for at least 72 hours. Unless there is an
> >>>> objection,
> >>>>> I will try to close it by Apr 3, 2020 06:30 UTC if we have received
> >>>>> sufficient votes.
> >>>>>
> >>>>>
> >>>>> [1]
> >>>>>
> >>>>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
> >>>>>
> >>>>> [2]
> >>>>>
> >>>>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-84-Feedback-Summary-td39261.html
> >>>>>
> >>>>>
> >>>>> Bests,
> >>>>> Godfrey
> >>>>>
> >>>>
> >>>>
>
>


Re: [VOTE] FLIP-84: Improve & Refactor API of TableEnvironment & Table

2020-04-06 Thread godfrey he
Hi, Kurt

yes. `TableEnvironement#executeSql` also could execute `SELECT` statement,
which is similar to `Table#execute`.
I add this to the document.

Best,
Godfrey

Kurt Young  于2020年4月7日周二 上午11:52写道:

> +1 (binding)
>
> The latest doc looks good to me. One minor comment is with the latest
> changes, it seems also very easy
> to support running SELECT query in TableEnvironement#executeSql method.
> Will this also be supported?
>
> Best,
> Kurt
>
>
> On Mon, Apr 6, 2020 at 10:49 PM Timo Walther  wrote:
>
> > Thanks, for the update.
> >
> > +1 (binding) for this FLIP
> >
> > Regards,
> > Timo
> >
> >
> > On 06.04.20 16:47, godfrey he wrote:
> > > Hi Timo,
> > >
> > > Sorry for late reply, and thanks for your correction. I have fixed the
> > typo
> > > and updated the document.
> > >
> > > Best,
> > > Godfrey
> > >
> > > Timo Walther  于2020年4月6日周一 下午6:05写道:
> > >
> > >> Hi Godfrey,
> > >>
> > >> did you see my remaining feedback in the discussion thread? We could
> > >> finish this FLIP if this gets resolved.
> > >>
> > >> Thanks,
> > >> Timo
> > >>
> > >> On 03.04.20 15:12, Terry Wang wrote:
> > >>> +1 (non-binding)
> > >>> Looks great to me, Thanks for driving on this.
> > >>>
> > >>> Best,
> > >>> Terry Wang
> > >>>
> > >>>
> > >>>
> > >>>> 2020年4月3日 21:07,godfrey he  写道:
> > >>>>
> > >>>> Hi everyone,
> > >>>>
> > >>>> I'd like to start the vote of FLIP-84[1] again, which is discussed
> and
> > >>>> reached consensus in the discussion thread[2].
> > >>>>
> > >>>> The vote will be open for at least 72 hours. Unless there is an
> > >> objection,
> > >>>> I will try to close it by Apr 6, 2020 13:10 UTC if we have received
> > >>>> sufficient votes.
> > >>>>
> > >>>>
> > >>>> [1]
> > >>>>
> > >>
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
> > >>>>
> > >>>> [2]
> > >>>>
> > >>
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-84-Feedback-Summary-td39261.html
> > >>>>
> > >>>>
> > >>>> Bests,
> > >>>> Godfrey
> > >>>>
> > >>>> godfrey he  于2020年3月31日周二 下午8:42写道:
> > >>>>
> > >>>>> Hi, Timo
> > >>>>>
> > >>>>> So sorry about that, I'm in a little hurry. Let's wait for 24h.
> > >>>>>
> > >>>>> Best,
> > >>>>> Godfrey
> > >>>>>
> > >>>>> Timo Walther  于2020年3月31日周二 下午5:26写道:
> > >>>>>
> > >>>>>> -1
> > >>>>>>
> > >>>>>> The current discussion has not completed. The last comments were
> > sent
> > >>>>>> less than 24h ago.
> > >>>>>>
> > >>>>>> Let's wait a bit longer to collect feedback from all stakeholders.
> > >>>>>>
> > >>>>>> Thanks,
> > >>>>>> Timo
> > >>>>>>
> > >>>>>> On 31.03.20 08:31, godfrey he wrote:
> > >>>>>>> Hi everyone,
> > >>>>>>>
> > >>>>>>> I'd like to start the vote of FLIP-84[1] again, because we have
> > some
> > >>>>>>> feedbacks. The feedbacks are all about new introduced methods,
> here
> > >> is
> > >>>>>> the
> > >>>>>>> discussion thread [2].
> > >>>>>>>
> > >>>>>>> The vote will be open for at least 72 hours. Unless there is an
> > >>>>>> objection,
> > >>>>>>> I will try to close it by Apr 3, 2020 06:30 UTC if we have
> received
> > >>>>>>> sufficient votes.
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> [1]
> > >>>>>>>
> > >>>>>>
> > >>
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
> > >>>>>>>
> > >>>>>>> [2]
> > >>>>>>>
> > >>>>>>
> > >>
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-84-Feedback-Summary-td39261.html
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> Bests,
> > >>>>>>> Godfrey
> > >>>>>>>
> > >>>>>>
> > >>>>>>
> > >>
> > >>
> > >
> >
> >
>


Re: [VOTE] FLIP-84: Improve & Refactor API of TableEnvironment & Table

2020-04-07 Thread godfrey he
Hi everyone,

Thanks all for the votes.
So far, we have

   - 3 binding +1 votes (Timo, Kurt, Dawid)
   - 1 non-binding +1 votes (Terry)
   - No -1 votes

The voting time has past and there is enough +1 votes to consider the FLIP-84
approved.
Thank you all.


Best,
Godfrey

Dawid Wysakowicz  于2020年4月7日周二 下午2:29写道:

> +1
>
> Best,
>
> Dawid
>
> On 07/04/2020 07:44, godfrey he wrote:
> > Hi, Kurt
> >
> > yes. `TableEnvironement#executeSql` also could execute `SELECT`
> statement,
> > which is similar to `Table#execute`.
> > I add this to the document.
> >
> > Best,
> > Godfrey
> >
> > Kurt Young  于2020年4月7日周二 上午11:52写道:
> >
> >> +1 (binding)
> >>
> >> The latest doc looks good to me. One minor comment is with the latest
> >> changes, it seems also very easy
> >> to support running SELECT query in TableEnvironement#executeSql method.
> >> Will this also be supported?
> >>
> >> Best,
> >> Kurt
> >>
> >>
> >> On Mon, Apr 6, 2020 at 10:49 PM Timo Walther 
> wrote:
> >>
> >>> Thanks, for the update.
> >>>
> >>> +1 (binding) for this FLIP
> >>>
> >>> Regards,
> >>> Timo
> >>>
> >>>
> >>> On 06.04.20 16:47, godfrey he wrote:
> >>>> Hi Timo,
> >>>>
> >>>> Sorry for late reply, and thanks for your correction. I have fixed the
> >>> typo
> >>>> and updated the document.
> >>>>
> >>>> Best,
> >>>> Godfrey
> >>>>
> >>>> Timo Walther  于2020年4月6日周一 下午6:05写道:
> >>>>
> >>>>> Hi Godfrey,
> >>>>>
> >>>>> did you see my remaining feedback in the discussion thread? We could
> >>>>> finish this FLIP if this gets resolved.
> >>>>>
> >>>>> Thanks,
> >>>>> Timo
> >>>>>
> >>>>> On 03.04.20 15:12, Terry Wang wrote:
> >>>>>> +1 (non-binding)
> >>>>>> Looks great to me, Thanks for driving on this.
> >>>>>>
> >>>>>> Best,
> >>>>>> Terry Wang
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>> 2020年4月3日 21:07,godfrey he  写道:
> >>>>>>>
> >>>>>>> Hi everyone,
> >>>>>>>
> >>>>>>> I'd like to start the vote of FLIP-84[1] again, which is discussed
> >> and
> >>>>>>> reached consensus in the discussion thread[2].
> >>>>>>>
> >>>>>>> The vote will be open for at least 72 hours. Unless there is an
> >>>>> objection,
> >>>>>>> I will try to close it by Apr 6, 2020 13:10 UTC if we have received
> >>>>>>> sufficient votes.
> >>>>>>>
> >>>>>>>
> >>>>>>> [1]
> >>>>>>>
> >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
> >>>>>>> [2]
> >>>>>>>
> >>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-84-Feedback-Summary-td39261.html
> >>>>>>>
> >>>>>>> Bests,
> >>>>>>> Godfrey
> >>>>>>>
> >>>>>>> godfrey he  于2020年3月31日周二 下午8:42写道:
> >>>>>>>
> >>>>>>>> Hi, Timo
> >>>>>>>>
> >>>>>>>> So sorry about that, I'm in a little hurry. Let's wait for 24h.
> >>>>>>>>
> >>>>>>>> Best,
> >>>>>>>> Godfrey
> >>>>>>>>
> >>>>>>>> Timo Walther  于2020年3月31日周二 下午5:26写道:
> >>>>>>>>
> >>>>>>>>> -1
> >>>>>>>>>
> >>>>>>>>> The current discussion has not completed. The last comments were
> >>> sent
> >>>>>>>>> less than 24h ago.
> >>>>>>>>>
> >>>>>>>>> Let's wait a bit longer to collect feedback from all
> stakeholders.
> >>>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>> Timo
> >>>>>>>>>
> >>>>>>>>> On 31.03.20 08:31, godfrey he wrote:
> >>>>>>>>>> Hi everyone,
> >>>>>>>>>>
> >>>>>>>>>> I'd like to start the vote of FLIP-84[1] again, because we have
> >>> some
> >>>>>>>>>> feedbacks. The feedbacks are all about new introduced methods,
> >> here
> >>>>> is
> >>>>>>>>> the
> >>>>>>>>>> discussion thread [2].
> >>>>>>>>>>
> >>>>>>>>>> The vote will be open for at least 72 hours. Unless there is an
> >>>>>>>>> objection,
> >>>>>>>>>> I will try to close it by Apr 3, 2020 06:30 UTC if we have
> >> received
> >>>>>>>>>> sufficient votes.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> [1]
> >>>>>>>>>>
> >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
> >>>>>>>>>> [2]
> >>>>>>>>>>
> >>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-84-Feedback-Summary-td39261.html
> >>>>>>>>>>
> >>>>>>>>>> Bests,
> >>>>>>>>>> Godfrey
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>
> >>>
>
>


Re: [VOTE] FLIP-71: E2E View support in Flink SQL

2020-04-12 Thread godfrey he
+1 (non-binding)

Best,
Godfrey

Benchao Li  于2020年4月12日周日 下午12:28写道:

> +1 (non-binding)
>
> zoudan  于2020年4月12日周日 上午9:52写道:
>
> > +1 (non-binding)
> >
> > Best,
> > Dan Zou
> >
> >
> > > 在 2020年4月10日,09:30,Danny Chan  写道:
> > >
> > > +1 from my side.
> > >
> > > Best,
> > > Danny Chan
> > > 在 2020年4月9日 +0800 PM9:23,Timo Walther ,写道:
> > >> +1 (binding)
> > >>
> > >> Thanks for your efforts.
> > >>
> > >> Regards,
> > >> Timo
> > >>
> > >>
> > >> On 09.04.20 14:46, Zhenghua Gao wrote:
> > >>> Hi all,
> > >>>
> > >>> I'd like to start the vote for FLIP-71[1] which adds E2E view support
> > in
> > >>> Flink SQL.
> > >>> This FLIP is discussed in the thread[2].
> > >>>
> > >>> The vote will be open for at least 72 hours. Unless there is an
> > objection.
> > >>> I will try to
> > >>> close it by April 13, 2020 09:00 UTC if we have received sufficient
> > votes.
> > >>>
> > >>> [1]
> > >>>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-71%3A+E2E+View+support+in+FLINK+SQL
> > >>>
> > >>> [2]
> > >>>
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-71-E2E-View-support-in-Flink-SQL-td33131.html#a39787
> > >>>
> > >>> *Best Regards,*
> > >>> *Zhenghua Gao*
> > >>>
> > >>
> >
> >
>
> --
>
> Benchao Li
> School of Electronics Engineering and Computer Science, Peking University
> Tel:+86-15650713730
> Email: libenc...@gmail.com; libenc...@pku.edu.cn
>


Re: [DISCUSS] Releasing "fat" and "slim" Flink distributions

2020-04-15 Thread godfrey he
Big +1.
This will improve user experience (special for Flink new users).
We answered so many questions about "class not found".

Best,
Godfrey

Dian Fu  于2020年4月15日周三 下午4:30写道:

> +1 to this proposal.
>
> Missing connector jars is also a big problem for PyFlink users. Currently,
> after a Python user has installed PyFlink using `pip`, he has to manually
> copy the connector fat jars to the PyFlink installation directory for the
> connectors to be used if he wants to run jobs locally. This process is very
> confuse for users and affects the experience a lot.
>
> Regards,
> Dian
>
> > 在 2020年4月15日,下午3:51,Jark Wu  写道:
> >
> > +1 to the proposal. I also found the "download additional jar" step is
> > really verbose when I prepare webinars.
> >
> > At least, I think the flink-csv and flink-json should in the
> distribution,
> > they are quite small and don't have other dependencies.
> >
> > Best,
> > Jark
> >
> > On Wed, 15 Apr 2020 at 15:44, Jeff Zhang  wrote:
> >
> >> Hi Aljoscha,
> >>
> >> Big +1 for the fat flink distribution, where do you plan to put these
> >> connectors ? opt or lib ?
> >>
> >> Aljoscha Krettek  于2020年4月15日周三 下午3:30写道:
> >>
> >>> Hi Everyone,
> >>>
> >>> I'd like to discuss about releasing a more full-featured Flink
> >>> distribution. The motivation is that there is friction for SQL/Table
> API
> >>> users that want to use Table connectors which are not there in the
> >>> current Flink Distribution. For these users the workflow is currently
> >>> roughly:
> >>>
> >>>  - download Flink dist
> >>>  - configure csv/Kafka/json connectors per configuration
> >>>  - run SQL client or program
> >>>  - decrypt error message and research the solution
> >>>  - download additional connector jars
> >>>  - program works correctly
> >>>
> >>> I realize that this can be made to work but if every SQL user has this
> >>> as their first experience that doesn't seem good to me.
> >>>
> >>> My proposal is to provide two versions of the Flink Distribution in the
> >>> future: "fat" and "slim" (names to be discussed):
> >>>
> >>>  - slim would be even trimmer than todays distribution
> >>>  - fat would contain a lot of convenience connectors (yet to be
> >>> determined which one)
> >>>
> >>> And yes, I realize that there are already more dimensions of Flink
> >>> releases (Scala version and Java version).
> >>>
> >>> For background, our current Flink dist has these in the opt directory:
> >>>
> >>>  - flink-azure-fs-hadoop-1.10.0.jar
> >>>  - flink-cep-scala_2.12-1.10.0.jar
> >>>  - flink-cep_2.12-1.10.0.jar
> >>>  - flink-gelly-scala_2.12-1.10.0.jar
> >>>  - flink-gelly_2.12-1.10.0.jar
> >>>  - flink-metrics-datadog-1.10.0.jar
> >>>  - flink-metrics-graphite-1.10.0.jar
> >>>  - flink-metrics-influxdb-1.10.0.jar
> >>>  - flink-metrics-prometheus-1.10.0.jar
> >>>  - flink-metrics-slf4j-1.10.0.jar
> >>>  - flink-metrics-statsd-1.10.0.jar
> >>>  - flink-oss-fs-hadoop-1.10.0.jar
> >>>  - flink-python_2.12-1.10.0.jar
> >>>  - flink-queryable-state-runtime_2.12-1.10.0.jar
> >>>  - flink-s3-fs-hadoop-1.10.0.jar
> >>>  - flink-s3-fs-presto-1.10.0.jar
> >>>  - flink-shaded-netty-tcnative-dynamic-2.0.25.Final-9.0.jar
> >>>  - flink-sql-client_2.12-1.10.0.jar
> >>>  - flink-state-processor-api_2.12-1.10.0.jar
> >>>  - flink-swift-fs-hadoop-1.10.0.jar
> >>>
> >>> Current Flink dist is 267M. If we removed everything from opt we would
> >>> go down to 126M. I would reccomend this, because the large majority of
> >>> the files in opt are probably unused.
> >>>
> >>> What do you think?
> >>>
> >>> Best,
> >>> Aljoscha
> >>>
> >>>
> >>
> >> --
> >> Best Regards
> >>
> >> Jeff Zhang
> >>
>
>


Re: [DISCUSS] Releasing Flink 1.10.1

2020-04-16 Thread godfrey he
Thanks a lot for driving this, Yu!
Some users are already asking "where to download release-1.10.1?"
Looking forward to the rc.

Best,
Godfrey

Hequn Cheng  于2020年4月16日周四 下午12:45写道:

> Thanks a lot for your great work Yu! Looking forward to the RC.
>
> Best, Hequn
>
> On Thu, Apr 16, 2020 at 10:35 AM Dian Fu  wrote:
>
> > Thanks a lot for driving this, Yu! Looking forward for the first RC of
> > 1.10.1.
> >
> > > 在 2020年4月16日,上午10:24,jincheng sun  写道:
> > >
> > > Looking forward the first RC of Flink 1.10.1 .
> > > Good  job Yu!
> > >
> > > Best,
> > > Jincheng
> > >
> > >
> > >
> > > Jark Wu  于2020年4月15日周三 下午6:28写道:
> > >
> > >> +1 to have a 1.10.1 RC soon. It has been a long time since 1.10.0 is
> > >> released.
> > >>
> > >> Best,
> > >> Jark
> > >>
> > >> On Wed, 15 Apr 2020 at 16:10, Till Rohrmann 
> > wrote:
> > >>
> > >>> Great to see that will have the first RC for Flink 1.10.1 soon.
> Thanks
> > a
> > >>> lot for driving this effort Yu!
> > >>>
> > >>> Cheers,
> > >>> Till
> > >>>
> > >>> On Sun, Apr 12, 2020 at 5:03 PM Yu Li  wrote:
> > >>>
> >  Thanks Weike and all others for the efforts!
> > 
> >  Here comes the latest status, we are in good shape and plan to
> produce
> > >>> RC1
> >  next week.
> > 
> >  * Blockers (1 left)
> >   - [Closed] FLINK-16018 Improve error reporting when submitting
> batch
> > >>> job
> >  (instead of AskTimeoutException)
> >   - [Closed] FLINK-16142 Memory Leak causes Metaspace OOM error on
> > >>> repeated
> >  job submission
> >   - [Closed] FLINK-16170 SearchTemplateRequest ClassNotFoundException
> > >>> when
> >  use flink-sql-connector-elasticsearch7
> >   - [Closed] FLINK-16262 Class loader problem with
> >  FlinkKafkaProducer.Semantic.EXACTLY_ONCE and usrlib directory
> >   - [Closed] FLINK-16406 Increase default value for JVM Metaspace to
> >  minimise its OutOfMemoryError
> >   - [Closed] FLINK-16454 Update the copyright year in NOTICE files
> >   - [Closed] FLINK-16705 LocalExecutor tears down MiniCluster before
> > >>> client
> >  can retrieve JobResult
> >   - [Closed] FLINK-16913 ReadableConfigToConfigurationAdapter#getEnum
> >  throws UnsupportedOperationException
> >   - [Closed] FLINK-16626 Exception encountered when cancelling a job
> in
> >  yarn per-job mode
> >   - [Fix for 1.10.1 is Done] FLINK-17093 Python UDF doesn't work when
> > >> the
> >  input column is of composite type
> >   - [PR reviewed] FLINK-16576 State inconsistency on restore with
> > >> memory
> >  state backends
> > 
> >  * Critical (1 left)
> >   - [Closed] FLINK-16047 Blink planner produces wrong aggregate
> results
> >  with state clean up
> >   - [Closed] FLINK-16070 Blink planner can not extract correct unique
> > >> key
> >  for UpsertStreamTableSink
> >   - [Fix for 1.10.1 is Done] FLINK-16225 Metaspace Out Of Memory
> should
> > >>> be
> >  handled as Fatal Error in TaskManager
> >   - [Closed] FLINK-14316 stuck in "Job leader ... lost leadership"
> > >> error
> >   - [May Postpone] FLINK-16408 Bind user code class loader to
> lifetime
> > >>> of a
> >  slot
> > 
> >  Please let me know if any concerns/comments. Thanks.
> > 
> >  Best Regards,
> >  Yu
> > 
> > 
> >  On Fri, 3 Apr 2020 at 21:35, DONG, Weike 
> > >>> wrote:
> > 
> > > Hi Yu,
> > >
> > > Thanks for your updates. I am still working on the fix for
> > >> FLINK-16626
> >  and
> > > it is expected to be completed by this Sunday after thorough
> testing.
> > >
> > > Sincerely,
> > > Weike
> > >
> > > On Fri, Apr 3, 2020 at 8:43 PM Yu Li  wrote:
> > >
> > >> Updates for 1.10.1 watched issues (we are in good progress and
> > >> almost
> > >> there
> > >> to produce the first RC, thanks all for the efforts):
> > >>
> > >> * Blockers (3 left)
> > >>  - [Closed] FLINK-16018 Improve error reporting when submitting
> > >> batch
> >  job
> > >> (instead of AskTimeoutException)
> > >>  - [Closed] FLINK-16142 Memory Leak causes Metaspace OOM error on
> > >> repeated
> > >> job submission
> > >>  - [Closed] FLINK-16170 SearchTemplateRequest
> > >> ClassNotFoundException
> >  when
> > >> use flink-sql-connector-elasticsearch7
> > >>  - [Closed] FLINK-16262 Class loader problem with
> > >> FlinkKafkaProducer.Semantic.EXACTLY_ONCE and usrlib directory
> > >>  - [Closed] FLINK-16406 Increase default value for JVM Metaspace
> to
> > >> minimise its OutOfMemoryError
> > >>  - [Closed] FLINK-16454 Update the copyright year in NOTICE files
> > >>  - [PR reviewed] FLINK-16576 State inconsistency on restore with
> > >>> memory
> > >> state backends
> > >>  - [Under Discussion] FLINK-16626 Exception encountered when
> >  cancelling a
> > >> job in yarn per-job mode
> > >>  - [Closed] FLINK-16705 LocalExecu

Re: [ANNOUNCE] New Apache Flink PMC Member - Hequn Chen

2020-04-17 Thread godfrey he
Congratulations, Hequn!

Best,
Godfrey

Leonard Xu  于2020年4月17日周五 下午4:30写道:

> Congratulations!
>
> Best,
> Leonard Xu
> > 在 2020年4月17日,15:46,Benchao Li  写道:
> >
> > Congratulations Hequn!
> >
> > Stephan Ewen  于2020年4月17日周五 下午3:42写道:
> >
> >> Congrats!
> >>
> >> On Fri, Apr 17, 2020 at 9:40 AM Jark Wu  wrote:
> >>
> >>> Congratulations Hequn!
> >>>
> >>> Best,
> >>> Jark
> >>>
> >>> On Fri, 17 Apr 2020 at 15:32, Yangze Guo  wrote:
> >>>
>  Congratulations!
> 
>  Best,
>  Yangze Guo
> 
>  On Fri, Apr 17, 2020 at 3:19 PM Jeff Zhang  wrote:
> >
> > Congratulations, Hequn!
> >
> > Paul Lam  于2020年4月17日周五 下午3:02写道:
> >
> >> Congrats Hequn! Thanks a lot for your contribution to the
> >> community!
> >>
> >> Best,
> >> Paul Lam
> >>
> >> Dian Fu  于2020年4月17日周五 下午2:58写道:
> >>
> >>> Congratulations, Hequn!
> >>>
>  在 2020年4月17日,下午2:36,Becket Qin  写道:
> 
>  Hi all,
> 
>  I am glad to announce that Hequn Chen has joined the Flink PMC.
> 
>  Hequn has contributed to Flink for years. He has worked on
> >>> several
>  components including Table / SQL,PyFlink and Flink ML Pipeline.
> >> Besides,
>  Hequn is also very active in the community since the beginning.
> 
>  Congratulations, Hequn! Looking forward to your future
>  contributions.
> 
>  Thanks,
> 
>  Jiangjie (Becket) Qin
>  (On behalf of the Apache Flink PMC)
> >>>
> >>>
> >>
> >
> >
> > --
> > Best Regards
> >
> > Jeff Zhang
> 
> >>>
> >>
> >
> >
> > --
> >
> > Benchao Li
> > School of Electronics Engineering and Computer Science, Peking University
> > Tel:+86-15650713730
> > Email: libenc...@gmail.com; libenc...@pku.edu.cn
>
>


Re: What is the suggested way to validate SQL?

2020-01-08 Thread godfrey he
hi kaibo,
As we discuss offline, I think it's a clean way that flink-table provides
an interface (or a tool) to do the sql validation for platform users.
`tEnv.sqlUpdate` or `tEnv.explain(false)` is a temporary solution which
contains too many unrelated logic (just consider the functionality whether
a sql is valid).

Best,
godfrey



Arvid Heise  于2020年1月8日周三 下午9:40写道:

> A common approach is to add the connector jar as test dependencies and have
> a smoke test that just starts the job with a temporary external system
> spawned with docker. I usually use test containers [1]. Then you simply
> need to execute the integration tests in your IDE and usually can even
> debug non-obvious errors.
>
> [1] https://www.testcontainers.org/
>
> On Mon, Dec 30, 2019 at 1:39 PM Kaibo Zhou  wrote:
>
> > Hi, Jingsong,
> >
> > Thank you very much for your suggestion.
> >
> > I verified that use `tEnv.sqlUpdate("xxx")` and `tEnv.explain(false)` to
> do
> > validation, it works.
> > But this method needs the connector jar, which is very inconvenient to
> use.
> >
> >
> > Hi, Danny,
> >
> > Many thanks for providing very useful explanations.
> >
> > The user case is users will register some source/sink tables, udf to
> > catalog service first, and then they will write and modify SQL like
> "insert
> > into sinkTable select * from sourceTable where a>1" on Web SQLEditor. The
> > platform wants to tell the user whether the SQL is valid includes the
> > detailed position if an error occurs.
> >
> > For the `insert target table`, the platform wants to validate the table
> > exists, field name and field type.
> >
> > Best,
> > Kaibo
> >
> > Danny Chan  于2019年12月30日周一 下午5:37写道:
> >
> > > Hi, Kaibo Zhou ~
> > >
> > > There are several phrases that a SQL text get to execution graph what
> can
> > > be run with Flink runtime:
> > >
> > >
> > > 1. Sql Parse: parse the sql text to AST(sql node tree)
> > > 2. Sql node(row type) validation, this includes the tables/schema
> > inference
> > > 3. Sql-to-rel conversion, convert the sql node to RelNode(relational
> > > algebra)
> > > 4. Promote the relational expression with planner(Volcano or Hep) then
> > > converts to execution convention nodes
> > > 5. Genegate the code and the execution graph
> > >
> > > For the first 3 steps, Apache Flink uses the Apache Calcite as the
> > > implementation, that means a SQL test passed to table environment would
> > > always have a SQL parse/validation/sql-to-rel conversion.
> > >
> > > For example, a code snippet like tableEnv.sqlQuery("INSERT INTO
> sinkTable
> > > SELECT f1,f2 FROM sourceTable”), the query part “SELECT f1,f2 FROM
> > > sourceTable” was validated.
> > >
> > > But you are right, for Flink SQL, an insert statement target table is
> not
> > > validated during the validation phrase, actually we validate the
> “select”
> > > clause first, extract the target table identifier and we validate the
> > > schema of “select” clause and target table are the same when we invoke
> > > write to sink(after step 4).
> > >
> > >
> > > For most of the cases this is okey, can you share your cases ? What
> kind
> > > of validation do you want for the insert target table ?
> > >
> > > We are planning to include the insert target table validation in the
> > step2
> > > for 2 reasons:
> > >
> > > • The computed column validation(stored or virtual)
> > > • The insert implicit type coercion
> > >
> > > But this would comes for Flink version 1.11 ~
> > >
> > >
> > > Best,
> > > Danny Chan
> > > 在 2019年12月27日 +0800 PM5:44,dev@flink.apache.org,写道:
> > > >
> > > > "INSERT INTO
> > > > sinkTable SELECT f1,f2 FROM sourceTable"
> > >
> >
>


Re: [DISCUSS] FLIP-91 - Support SQL Client Gateway

2020-01-17 Thread godfrey he
Hi devs,

I've updated the FLIP-91 [0] according to feedbacks. Please take another
look.

Best,
godfrey

[0]
https://docs.google.com/document/d/1DKpFdov1o_ObvrCmU-5xi-VrT6nR2gxq-BbswSSI9j8/


Kurt Young  于2020年1月9日周四 下午4:21写道:

> Hi,
>
> +1 to the general idea. Supporting sql client gateway mode will bridge the
> connection
> between Flink SQL and production environment. Also the JDBC driver is a
> quite good
> supplement for usability of Flink SQL, users will have more choices to try
> out Flink SQL
> such as Tableau.
>
> I went through the document and left some comments there.
>
> Best,
> Kurt
>
>
> On Sun, Jan 5, 2020 at 1:57 PM tison  wrote:
>
> > The general idea sounds great. I'm going to keep up with the progress
> soon.
> >
> > Best,
> > tison.
> >
> >
> > Bowen Li  于2020年1月5日周日 下午12:59写道:
> >
> > > +1. It will improve user experience quite a bit.
> > >
> > >
> > > On Thu, Jan 2, 2020 at 22:07 Yangze Guo  wrote:
> > >
> > > > Thanks for driving this, Xiaoling!
> > > >
> > > > +1 for supporting SQL client gateway.
> > > >
> > > > Best,
> > > > Yangze Guo
> > > >
> > > >
> > > > On Thu, Jan 2, 2020 at 9:58 AM 贺小令  wrote:
> > > > >
> > > > > Hey everyone,
> > > > > FLIP-24
> > > > > <
> > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-24+-+SQL+Client
> >
> > > > > proposes the whole conception and architecture of SQL Client. The
> > > > embedded
> > > > > mode is already supported since release-1.5, which is helpful for
> > > > > debugging/demo purposes.
> > > > > Many users ask that how to submit a Flink job to online environment
> > > > without
> > > > > programming on Flink API. To solve this, we create FLIP-91 [0]
> which
> > > > > supports sql client gateway mode, then users can submit a job
> through
> > > CLI
> > > > > client, REST API or JDBC.
> > > > >
> > > > > I'm glad that you can give me more feedback about FLIP-91.
> > > > >
> > > > > Best,
> > > > > godfreyhe
> > > > >
> > > > > [0]
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
> > > >
> > >
> >
>


Re: [jira] [Created] (FLINK-15644) Add support for SQL query validation

2020-01-18 Thread godfrey he
hi Flavio, TableEnvironment.getCompletionHints maybe already meet the
requirement.

Flavio Pompermaier  于2020年1月18日周六 下午3:39写道:

> Why not adding also a suggest() method (also unimplemented initially) that
> would return the list of suitable completions/tokens on the current query?
> How complex eould it be to implement it in you opinion?
>
> Il Ven 17 Gen 2020, 18:32 Fabian Hueske (Jira)  ha
> scritto:
>
> > Fabian Hueske created FLINK-15644:
> > -
> >
> >  Summary: Add support for SQL query validation
> >  Key: FLINK-15644
> >  URL: https://issues.apache.org/jira/browse/FLINK-15644
> >  Project: Flink
> >   Issue Type: New Feature
> >   Components: Table SQL / API
> > Reporter: Fabian Hueske
> >
> >
> > It would be good if the {{TableEnvironment}} would offer methods to check
> > the validity of SQL queries. Such a method could be used by services (CLI
> > query shells, notebooks, SQL UIs) that are backed by Flink and execute
> > their queries on Flink.
> >
> > Validation should be available in two levels:
> >  # Validation of syntax and semantics: This includes parsing the query,
> > checking the catalog for dbs, tables, fields, type checks for expressions
> > and functions, etc. This will check if the query is a valid SQL query.
> >  # Validation that query is supported: Checks if Flink can execute the
> > given query. Some syntactically and semantically valid SQL queries are
> not
> > supported, esp. in a streaming context. This requires running the
> > optimizer. If the optimizer generates an execution plan, the query can be
> > executed. This check includes the first step and is more expensive.
> >
> > The reason for this separation is that the first check can be done much
> > fast as it does not involve calling the optimizer. Hence, it would be
> > suitable for fast checks in an interactive query editor. The second check
> > might take more time (depending on the complexity of the query) and might
> > not be suitable for rapid checks but only on explicit user request.
> >
> > Requirements:
> >  * validation does not modify the state of the {{TableEnvironment}}, i.e.
> > it does not add plan operators
> >  * validation does not require connector dependencies
> >  * validation can identify the update mode of a continuous query result
> > (append-only, upsert, retraction).
> >
> > Out of scope for this issue:
> >  * better error messages for unsupported features as suggested by
> > FLINK-7217
> >
> >
> >
> > --
> > This message was sent by Atlassian Jira
> > (v8.3.4#803005)
> >
>


Re: Re: [ANNOUNCE] New Apache Flink Committer - Igal Shilman

2020-09-17 Thread godfrey he
Congratulations!

Best,
Godfrey

Igal Shilman  于2020年9月16日周三 下午4:35写道:

> Thank you all very much for your kind welcome :-)
>
> Thanks,
> Igal.
>
> On Wed, Sep 16, 2020 at 8:50 AM Kostas Kloudas  wrote:
>
> > Congratulations Igal and welcome!
> >
> > Kostas
> >
> > On Wed, Sep 16, 2020 at 6:37 AM Guowei Ma  wrote:
> > >
> > > Congratulations :)
> > > Best,
> > > Guowei
> > >
> > >
> > > On Wed, Sep 16, 2020 at 11:54 AM Zhijiang
> > >  wrote:
> > >
> > > > Congratulations and welcome, Igal!
> > > >
> > > >
> > > > --
> > > > From:Yun Gao 
> > > > Send Time:2020年9月16日(星期三) 10:59
> > > > To:Stephan Ewen ; dev 
> > > > Subject:Re: Re: [ANNOUNCE] New Apache Flink Committer - Igal Shilman
> > > >
> > > > Congratulations Igal!
> > > >
> > > > Best,
> > > >  Yun
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Sender:Stephan Ewen
> > > > Date:2020/09/15 22:48:30
> > > > Recipient:dev
> > > > Theme:Re: [ANNOUNCE] New Apache Flink Committer - Igal Shilman
> > > >
> > > > Welcome, Igal!
> > > >
> > > > On Tue, Sep 15, 2020 at 3:18 PM Seth Wiesman 
> > wrote:
> > > >
> > > > > Congrats Igal!
> > > > >
> > > > > On Tue, Sep 15, 2020 at 7:13 AM Benchao Li 
> > wrote:
> > > > >
> > > > > > Congratulations!
> > > > > >
> > > > > > Zhu Zhu  于2020年9月15日周二 下午6:51写道:
> > > > > >
> > > > > > > Congratulations, Igal!
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Zhu
> > > > > > >
> > > > > > > Rafi Aroch  于2020年9月15日周二 下午6:43写道:
> > > > > > >
> > > > > > > > Congratulations Igal! Well deserved!
> > > > > > > >
> > > > > > > > Rafi
> > > > > > > >
> > > > > > > >
> > > > > > > > On Tue, Sep 15, 2020 at 11:14 AM Tzu-Li (Gordon) Tai <
> > > > > > > tzuli...@apache.org>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi all,
> > > > > > > > >
> > > > > > > > > It's great seeing many new Flink committers recently, and
> to
> > add
> > > > to
> > > > > > > that
> > > > > > > > > I'd like to announce one more new committer: Igal Shilman!
> > > > > > > > >
> > > > > > > > > Igal has been a long time member of the community. You may
> > very
> > > > > > likely
> > > > > > > > know
> > > > > > > > > Igal from the Stateful Functions sub-project, as he was the
> > > > > original
> > > > > > > > author
> > > > > > > > > of it before it was contributed to Flink.
> > > > > > > > > Ever since StateFun was contributed to Flink, he has
> > consistently
> > > > > > > > > maintained the project and supported users in the mailing
> > lists.
> > > > > > > > > Before that, he had also helped tremendously in some work
> on
> > > > > Flink's
> > > > > > > > > serialization stack.
> > > > > > > > >
> > > > > > > > > Please join me in welcoming and congratulating Igal for
> > becoming
> > > > a
> > > > > > > Flink
> > > > > > > > > committer!
> > > > > > > > >
> > > > > > > > > Cheers,
> > > > > > > > > Gordon
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > >
> > > > > > Best,
> > > > > > Benchao Li
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> >
>


Re: [ANNOUNCE] New Apache Flink Committer - Niels Basjes

2020-09-17 Thread godfrey he
Congratulations!

Best,
Godfrey

Guowei Ma  于2020年9月16日周三 下午12:38写道:

> Congratulations :)
>
> Best,
> Guowei
>
>
> On Tue, Sep 15, 2020 at 6:14 PM Matthias Pohl 
> wrote:
>
> > Congrats!
> >
> > Best,
> > Matthias
> >
> > On Tue, Sep 15, 2020 at 9:26 AM Dawid Wysakowicz  >
> > wrote:
> >
> > > Welcome, Niels!
> > >
> > > Best,
> > >
> > > Dawid
> > >
> > > On 14/09/2020 11:22, Matt Wang wrote:
> > > > Congratulations, Niels!
> > > >
> > > >
> > > > --
> > > >
> > > > Best,
> > > > Matt Wang
> > > >
> > > >
> > > > On 09/14/2020 17:02,Konstantin Knauf
> wrote:
> > > > Congratulations!
> > > >
> > > > On Mon, Sep 14, 2020 at 10:51 AM tison  wrote:
> > > >
> > > > Congrats!
> > > >
> > > > Best,
> > > > tison.
> > > >
> > > >
> > > > Aljoscha Krettek  于2020年9月14日周一 下午4:38写道:
> > > >
> > > > Congratulations! 💐
> > > >
> > > > Aljoscha
> > > >
> > > > On 14.09.20 10:37, Robert Metzger wrote:
> > > > Hi all,
> > > >
> > > > On behalf of the PMC, I’m very happy to announce Niels Basjes as a
> new
> > > > Flink committer.
> > > >
> > > > Niels has been an active community member since the early days of
> > > > Flink,
> > > > with 19 commits dating back until 2015.
> > > > Besides his work on the code, he has been driving initiatives on dev@
> > > > list,
> > > > supporting users and giving talks at conferences.
> > > >
> > > > Please join me in congratulating Niels for becoming a Flink
> committer!
> > > >
> > > > Best,
> > > > Robert Metzger
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Konstantin Knauf | Head of Product
> > > >
> > > > +49 160 91394525
> > > >
> > > >
> > > > Follow us @VervericaData Ververica 
> > > >
> > > >
> > > > --
> > > >
> > > > Join Flink Forward  - The Apache Flink
> > > > Conference
> > > >
> > > > Stream Processing | Event Driven | Real Time
> > > >
> > > > --
> > > >
> > > > Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
> > > >
> > > > --
> > > > Ververica GmbH
> > > > Registered at Amtsgericht Charlottenburg: HRB 158244 B
> > > > Managing Directors: Yip Park Tung Jason, Jinwei (Kevin) Zhang, Karl
> > Anton
> > > > Wehner
> > >
> >
>


Re: [ANNOUNCE] New Apache Flink Committer - Yun Tang

2020-09-17 Thread godfrey he
Congratulations!

Regards,
Godfrey

Yun Tang  于2020年9月17日周四 下午2:22写道:

>  Thanks for all your kind welcome and very glad to be one of the
> committers of Flink community.
>
> Best
> Yun Tang
>
> 
> From: Congxian Qiu 
> Sent: Wednesday, September 16, 2020 13:10
> To: dev@flink.apache.org 
> Cc: Zhijiang ; tangyun ;
> Yun Tang 
> Subject: Re: [ANNOUNCE] New Apache Flink Committer - Yun Tang
>
> Congratulations!
> Best,
> Congxian
>
>
> Guowei Ma mailto:guowei@gmail.com>>
> 于2020年9月16日周三 下午12:37写道:
> Congratulations :)
>
> Best,
> Guowei
>
>
> On Wed, Sep 16, 2020 at 11:54 AM Zhijiang
> mailto:wangzhijiang...@aliyun.com>.invalid>
> wrote:
>
> > Congratulations and welcome, Yun!
> >
> >
> > --
> > From:Jark Wu mailto:imj...@gmail.com>>
> > Send Time:2020年9月16日(星期三) 11:35
> > To:dev mailto:dev@flink.apache.org>>
> > Cc:tangyun mailto:tang...@apache.org>>; Yun Tang <
> myas...@live.com>
> > Subject:Re: [ANNOUNCE] New Apache Flink Committer - Yun Tang
> >
> > Congratulations Yun!
> >
> > On Wed, 16 Sep 2020 at 10:40, Rui Li  lirui.fu...@gmail.com>> wrote:
> >
> > > Congratulations Yun!
> > >
> > > On Wed, Sep 16, 2020 at 10:20 AM Paul Lam  > wrote:
> > >
> > > > Congrats, Yun! Well deserved!
> > > >
> > > > Best,
> > > > Paul Lam
> > > >
> > > > > 2020年9月15日 19:14,Yang Wang  danrtsey...@gmail.com>> 写道:
> > > > >
> > > > > Congratulations, Yun!
> > > > >
> > > > > Best,
> > > > > Yang
> > > > >
> > > > > Leonard Xu mailto:xbjt...@gmail.com>>
> 于2020年9月15日周二 下午7:11写道:
> > > > >
> > > > >> Congrats, Yun!
> > > > >>
> > > > >> Best,
> > > > >> Leonard
> > > > >>> 在 2020年9月15日,19:01,Yangze Guo  karma...@gmail.com>> 写道:
> > > > >>>
> > > > >>> Congrats, Yun!
> > > > >>
> > > > >>
> > > >
> > > >
> > >
> > > --
> > > Best regards!
> > > Rui Li
> > >
> >
> >
>


Re: Re: [ANNOUNCE] New Apache Flink Committer - Godfrey He

2020-09-20 Thread godfrey he
Thanks everyone for the warm reception!

Best,
Godfrey

Rui Li  于2020年9月18日周五 下午6:21写道:

> Congrats Godfrey! Well deserved!
>
> On Fri, Sep 18, 2020 at 5:12 PM Yun Gao 
> wrote:
>
>> Congratulations Godfrey!
>>
>> Best,
>> Yun
>>
>>
>>
>>  --Original Mail --
>> Sender:Dawid Wysakowicz 
>> Send Date:Thu Sep 17 14:45:55 2020
>> Recipients:Flink Dev , 贺小令 
>> Subject:Re: [ANNOUNCE] New Apache Flink Committer - Godfrey He
>> Congratulations!
>>
>> On 16/09/2020 06:19, Jark Wu wrote:
>> > Hi everyone,
>> >
>> > It's great seeing many new Flink committers recently, and on behalf of
>> the
>> > PMC,
>> > I'd like to announce one more new committer: Godfrey He.
>> >
>> > Godfrey is a very long time contributor in the Flink community since the
>> > end of 2016.
>> > He has been a very active contributor in the Flink SQL component with
>> 153
>> > PRs and more than 571,414 lines which is quite outstanding.
>> > Godfrey has paid essential effort with SQL optimization and helped a lot
>> > during the blink merging.
>> > Besides that, he is also quite active with community work especially in
>> > Chinese mailing list.
>> >
>> > Please join me in congratulating Godfrey for becoming a Flink committer!
>> >
>> > Cheers,
>> > Jark Wu
>> >
>>
>>
>
> --
> Best regards!
> Rui Li
>


[DISCUSS] Support Multiple Input for Blink Planner

2020-10-13 Thread godfrey he
Hi devs,

I would like to start the discussion about supporting multiple input for
blink planner.

As FLIP-92[1] introduces multiple-input stream operator, we can merge the
operators connected by forward shuffle into multiple input operator. So
that the network shuffle can be changed to local function calls,
and significantly performance improvement can be achieved when there is
plenty of data to shuffle.

We have written a design document[2] to describe the details. Please feel
free to join the discussion and any feedback is welcome!
[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-92%3A+Add+N-Ary+Stream+Operator+in+Flink
[2]
https://docs.google.com/document/d/1qKVohV12qn-bM51cBZ8Hcgp31ntwClxjoiNBUOqVHsI


Best,
Godfrey & Caizhi


Re: [ANNOUNCE] New PMC member: Zhu Zhu

2020-10-15 Thread godfrey he
Congratulations, Zhu

Godfrey,
Best

刘 芃成  于2020年10月10日周六 下午1:03写道:

> Congratulations, Zhu!
>
> Best,
> Pengcheng
>
> 在 2020/10/10 上午11:14,“Kurt Young” 写入:
>
> Congratulations, Zhu Zhu!
>
> Best,
> Kurt
>
>
> On Sat, Oct 10, 2020 at 11:03 AM Yang Wang 
> wrote:
>
> > Congratulations! Zhu Zhu.
> >
> > Best,
> > Yang
> >
> > Xintong Song  于2020年10月9日周五 下午3:35写道:
> >
> > > Congratulations, Zhu~!
> > >
> > > Thank you~
> > >
> > > Xintong Song
> > >
> > >
> > >
> > > On Fri, Oct 9, 2020 at 3:17 PM Jingsong Li  >
> > wrote:
> > >
> > > > Congratulations, Zhu Zhu!
> > > >
> > > > On Fri, Oct 9, 2020 at 3:08 PM Zhijiang <
> wangzhijiang...@aliyun.com
> > > > .invalid>
> > > > wrote:
> > > >
> > > > > Congratulations and welcome, Zhu Zhu!
> > > > >
> > > > > Best,
> > > > > Zhijiang
> > > > >
> --
> > > > > From:Yun Tang 
> > > > > Send Time:2020年10月9日(星期五) 14:20
> > > > > To:dev@flink.apache.org 
> > > > > Subject:Re: [ANNOUNCE] New PMC member: Zhu Zhu
> > > > >
> > > > > Congratulations, Zhu!
> > > > >
> > > > > Best
> > > > > Yun Tang
> > > > > 
> > > > > From: Danny Chan 
> > > > > Sent: Friday, October 9, 2020 13:51
> > > > > To: dev@flink.apache.org 
> > > > > Subject: Re: [ANNOUNCE] New PMC member: Zhu Zhu
> > > > >
> > > > > Congrats, Zhu Zhu ~
> > > > >
> > > > > Best,
> > > > > Danny Chan
> > > > > 在 2020年10月9日 +0800 PM1:05,dev@flink.apache.org,写道:
> > > > > >
> > > > > > Congrats, Zhu Zhu
> > > > >
> > > > >
> > > >
> > > > --
> > > > Best, Jingsong Lee
> > > >
> > >
> >
>


Re: [VOTE] FLIP-146: Improve new TableSource and TableSink interfaces

2020-10-18 Thread godfrey he
+1

Jingsong Li  于2020年10月19日周一 上午10:54写道:

> +1
>
> On Fri, Oct 16, 2020 at 2:33 PM Leonard Xu  wrote:
>
> > +1
> >
> > Best,
> > Leonard
> >
> > > 在 2020年10月16日,11:01,Jark Wu  写道:
> > >
> > > +1
> > >
> > > On Fri, 16 Oct 2020 at 10:27, admin <17626017...@163.com> wrote:
> > >
> > >> +1
> > >>
> > >>> 2020年10月16日 上午10:05,Danny Chan  写道:
> > >>>
> > >>> +1, nice job !
> > >>>
> > >>> Best,
> > >>> Danny Chan
> > >>> 在 2020年10月15日 +0800 PM8:08,Jingsong Li ,写道:
> >  Hi all,
> > 
> >  I would like to start the vote for FLIP-146 [1], which is discussed
> > and
> >  reached consensus in the discussion thread [2]. The vote will be
> open
> > >> until
> >  20th Oct. (72h, exclude weekends), unless there is an objection or
> not
> >  enough votes.
> > 
> >  [1]
> > 
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-146%3A+Improve+new+TableSource+and+TableSink+interfaces
> > 
> >  [2]
> > 
> > >>
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-146-Improve-new-TableSource-and-TableSink-interfaces-td45161.html
> > 
> >  Best,
> >  Jingsong Lee
> > >>
> > >>
> >
> >
>
> --
> Best, Jingsong Lee
>


Re: [ANNOUNCE] New Apache Flink Committer - Congxian Qiu

2020-11-03 Thread godfrey he
Congratulations! Congxian

Best,
Godfrey

Fabian Hueske  于2020年11月2日周一 下午7:00写道:

> Congrats Congxian!
>
> Cheers, Fabian
>
> Am Mo., 2. Nov. 2020 um 10:33 Uhr schrieb Yang Wang  >:
>
> > Congratulations Congxian!
> >
> > Best,
> > Yang
> >
> > Zhu Zhu  于2020年11月2日周一 下午5:14写道:
> >
> > > Congrats Congxian!
> > >
> > > Thanks,
> > > Zhu
> > >
> > > Pengcheng Liu  于2020年11月2日周一 下午5:01写道:
> > >
> > > > Congratulations Congxian!
> > > >
> > > > Matthias Pohl  于2020年11月2日周一 下午3:57写道:
> > > >
> > > > > Yup, congratulations Congxian!
> > > > >
> > > > > On Mon, Nov 2, 2020 at 8:46 AM Danny Chan 
> > > wrote:
> > > > >
> > > > > > Congrats, Doctor Qiu! Well deserved!
> > > > > >
> > > > > > Congxian Qiu  于2020年10月31日周六 下午9:45写道:
> > > > > >
> > > > > > > Thanks all for the support. It's a great honor for me.
> > > > > > >
> > > > > > > Best,
> > > > > > > Congxian
> > > > > > >
> > > > > > >
> > > > > > > Paul Lam  于2020年10月30日周五 下午3:18写道:
> > > > > > >
> > > > > > > > Congrats, Congxian! Well deserved!
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Paul Lam
> > > > > > > >
> > > > > > > > > 2020年10月30日 15:12,Zhijiang  > > .INVALID>
> > > > > 写道:
> > > > > > > > >
> > > > > > > > > Congrats, Congxian!
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
>


Re: [DISCUSS] FLIP-275: Support Remote SQL Client Based on SQL Gateway

2022-12-05 Thread godfrey he
Hi Zelin,

Thanks for driving this discussion.

I have a few comments,

> Add RowFormat to ResultSet to indicate the format of rows.
We should not require SqlGateway server to meet the display
requirements of a CliClient.
Because different CliClients may have different display style. The
server just need to response the data,
and the CliClient prints the result as needed. So RowFormat is not needed.

> Add ContentType to ResultSet to indicate what kind of data the result 
> contains.
from my first sight, the values of ContentType are intersected, such
as: A select query will return QUERY_RESULT,
but it also has JOB_ID. OTHER is too ambiguous, I don't know which
kind of query will return OTHER.
I recommend returning the concrete type for each statement, such as
"CREATE TABLE" for "create table xx (...) with ()",
"SELECT" for "select * from xxx". The statement type can be maintained
in `Operation`s.

>Error Handling
I think current design of error handling mechanism can meet the
requirement of CliClient, we can get the root cause from
the stack (see ErrorResponseBody#errors). If it becomes a common
requirement (for many clients) in the future,
we can introduce this interface.

>Runtime REST API Modification for Local Client Migration
I think this part is over-engineered, this part belongs to optimization.
The client does not require very high performance, the current design
can already meet our needs.
If we find performance problems in the future, do such optimizations.

Best,
Godfrey

yu zelin  于2022年12月5日周一 11:11写道:
>
> Hi, Shammon
>
> Thanks for your feedback. I think it’s good to support jdbc-sdk. However,
> it's not supported in the gateway side yet. In my opinion, this FLIP is more
> concerned with the SQL Client. How about put “supporting jdbc-sdk” in
> ‘Future Work’? We can discuss how to implement it in another thread.
>
> Best,
> Yu Zelin
> > 2022年12月2日 18:12,Shammon FY  写道:
> >
> > Hi zelin
> >
> > Thanks for driving this discussion.
> >
> > I notice that the sql-client will interact with sql-gateway by `REST
> > Client` in the `Executor` in the FLIP, how about introducing jdbc-sdk for
> > sql-gateway?
> >
> > Then the sql-client can connect the gateway with jdbc-sdk, on the other
> > hand, the other applications and tools such as jmeter can use the jdbc-sdk
> > to connect sql-gateway too.
> >
> > Best,
> > Shammon
> >
> >
> > On Fri, Dec 2, 2022 at 4:10 PM yu zelin  wrote:
> >
> >> Hi Jim,
> >>
> >> Thanks for your feedback!
> >>
> >>> Should this configuration be mentioned in the FLIP?
> >>
> >> Sure.
> >>
> >>> some way for the server to be able to limit the number of requests it
> >> receives.
> >> I’m sorry that this FLIP is dedicated in implementing the Remote mode, so
> >> we
> >> didn't consider much about this. I think the option is enough currently.
> >> I will add
> >> the improvement suggestions to the ‘Future Work’.
> >>
> >>> I wonder if two other options are possible
> >>
> >> To forward the raw format to gateway and then to client is possible. The
> >> raw
> >> results from sink is in ‘CollectResultIterator#bufferedResult’. First, we
> >> can find
> >> a way to get this result without wrapping it. Second, constructing a
> >> ‘InternalTypeInfo’.
> >> We can construct it using the schema information (data’s logical type).
> >> After
> >> construction, we can get the ’TypeSerializer’ to deserialize the raw
> >> result.
> >>
> >>
> >>
> >>
> >>> 2022年12月1日 04:54,Jim Hughes  写道:
> >>>
> >>> Hi Yu,
> >>>
> >>> Thanks for moving my comments to this thread!  Also, thank you for
> >>> answering my questions; it is helping me understand the SQL Gateway
> >>> better.
> >>>
> >>> 5.
>  Our idea is to introduce a new session option (like
> >>> 'sql-client.result.fetch-interval') to control
> >>> the fetching requests sending frequency. What do you think?
> >>>
> >>> Should this configuration be mentioned in the FLIP?
> >>>
> >>> One slight concern I have with having 'sql-client.result.fetch-interval'
> >> as
> >>> a session configuration is that users could set it low and cause the
> >> client
> >>> to send a large volume of requests to the SQL gateway.
> >>>
> >>> Generally, I'd like to see some way for the server to be able to limit
> >> the
> >>> number of requests it receives.  If that really needs to be done by a
> >> proxy
> >>> in front of the SQL gateway, that is fine as well.  (To be clear, I don't
> >>> think my concern here should be blocking in any way.)
> >>>
> >>> 7.
>  What is the serialization lifecycle for results?
> >>>
> >>> I wonder if two other options are possible:
> >>> 3) Could the Gateway just forward the result byte array?  (Or does the
> >>> Gateway need to deserialize the response in order to understand it for
> >> some
> >>> reason?)
> >>> 4) Could the JobManager prepare the results in JSON?  (Or similarly could
> >>> the Client read the format which the JobManager sends?)
> >>>
> >>> Thanks again!
> >>>
> >>> Cheers,
> >>>
> >>> Jim
> >>>
> >>> On Wed, Nov 

Re: [DISCUSS] FLIP-275: Support Remote SQL Client Based on SQL Gateway

2022-12-06 Thread godfrey he
Hi, zeklin

>The CLI will use default print style for the non-query result.
Please make sure the print results of EXPLAIN/DESC/SHOW CREATE TABLE
commands are clear.

> We think it’s better to add the root cause to the ErrorResponseBody.
LGTM

Best,
Godfrey

yu zelin  于2022年12月6日周二 17:51写道:
>
> Hi, Godfrey
>
> Thanks for your feedback. Below is my thoughts about your questions.
>
> 1. About RowFormat.
> I agree to your opinion. So we decided to revert the RowFormat related changes
> and let the client to resolve the print format.
>
> 2. About ContentType
> I agree that the definition of the ContentType is not clear. But how to 
> define the
> statement type is another big question. So, we decided to only tell the query 
> result
> and non-query result apart. The CLI will use default print style for the 
> non-query
> result.
>
> 3. About ErrorHandling
> I think reuse the current ErrorResponseBody is good, but parse the root cause
> from the exception stack strings is quite hacking. We think it’s better to 
> add the
> root cause to the ErrorResponseBody.
>
> 4. About Runtime REST API Modifications
> I agree, too. This part is moved to the ‘Future Work’.
>
> Best,
> Yu Zelin
>
>
> > 2022年12月5日 18:33,godfrey he  写道:
> >
> > Hi Zelin,
> >
> > Thanks for driving this discussion.
> >
> > I have a few comments,
> >
> >> Add RowFormat to ResultSet to indicate the format of rows.
> > We should not require SqlGateway server to meet the display
> > requirements of a CliClient.
> > Because different CliClients may have different display style. The
> > server just need to response the data,
> > and the CliClient prints the result as needed. So RowFormat is not needed.
> >
> >> Add ContentType to ResultSet to indicate what kind of data the result 
> >> contains.
> > from my first sight, the values of ContentType are intersected, such
> > as: A select query will return QUERY_RESULT,
> > but it also has JOB_ID. OTHER is too ambiguous, I don't know which
> > kind of query will return OTHER.
> > I recommend returning the concrete type for each statement, such as
> > "CREATE TABLE" for "create table xx (...) with ()",
> > "SELECT" for "select * from xxx". The statement type can be maintained
> > in `Operation`s.
> >
> >> Error Handling
> > I think current design of error handling mechanism can meet the
> > requirement of CliClient, we can get the root cause from
> > the stack (see ErrorResponseBody#errors). If it becomes a common
> > requirement (for many clients) in the future,
> > we can introduce this interface.
> >
> >> Runtime REST API Modification for Local Client Migration
> > I think this part is over-engineered, this part belongs to optimization.
> > The client does not require very high performance, the current design
> > can already meet our needs.
> > If we find performance problems in the future, do such optimizations.
> >
> > Best,
> > Godfrey
> >
> > yu zelin  于2022年12月5日周一 11:11写道:
> >>
> >> Hi, Shammon
> >>
> >> Thanks for your feedback. I think it’s good to support jdbc-sdk. However,
> >> it's not supported in the gateway side yet. In my opinion, this FLIP is 
> >> more
> >> concerned with the SQL Client. How about put “supporting jdbc-sdk” in
> >> ‘Future Work’? We can discuss how to implement it in another thread.
> >>
> >> Best,
> >> Yu Zelin
> >>> 2022年12月2日 18:12,Shammon FY  写道:
> >>>
> >>> Hi zelin
> >>>
> >>> Thanks for driving this discussion.
> >>>
> >>> I notice that the sql-client will interact with sql-gateway by `REST
> >>> Client` in the `Executor` in the FLIP, how about introducing jdbc-sdk for
> >>> sql-gateway?
> >>>
> >>> Then the sql-client can connect the gateway with jdbc-sdk, on the other
> >>> hand, the other applications and tools such as jmeter can use the jdbc-sdk
> >>> to connect sql-gateway too.
> >>>
> >>> Best,
> >>> Shammon
> >>>
> >>>
> >>> On Fri, Dec 2, 2022 at 4:10 PM yu zelin  wrote:
> >>>
> >>>> Hi Jim,
> >>>>
> >>>> Thanks for your feedback!
> >>>>
> >>>>> Should this configuration be mentioned in the FLIP?
> >>>>
> >>>> Sure.
> >>>>
> >>>>> some way for the server to be able to

Re: [DISCUSS] Release Flink 1.16.1

2022-12-20 Thread godfrey he
Hi Martijn,

Thank you for bringing this up.

About Lincoln mentioned 3 commits, +1 to pick them into 1.16.1.
AFAIK, several users have encountered this kind of data correctness
problem so far, they are waiting a fix release as soon as possible.

Best,
Godfrey

ConradJam  于2022年12月20日周二 15:08写道:

> Hi Martijn,
>
> FLINK-30116  After
> merge.Flink Web Ui Configuration Can't show it,I checked the data returned
> by the back end and there is no problem, but there is an error in the front
> end, as shown in the picture below, can someone take a look before release
> 1.16.1 ?
>
> [image: Pasted Graphic.png]
>
> [image: Pasted Graphic 1.png]
>
> Martijn Visser  于2022年12月16日周五 02:52写道:
>
>> Hi everyone,
>>
>> I would like to open a discussion about releasing Flink 1.16.1. We've
>> released Flink 1.16 at the end of October, but we already have 58 fixes
>> listed for 1.16.1, including a blocker [1] on the environment variables
>> and
>> a number of critical issues. Some of the critical issues are related to
>> the
>> bugs on the Sink API, on PyFlink and some correctness issues.
>>
>> There are also a number of open issues with a fixVersion set to 1.16.1, so
>> it would be good to understand what the community thinks of starting a
>> release or if there are some fixes that should be included with 1.16.1.
>>
>> Best regards,
>>
>> Martijn
>>
>> [1] https://issues.apache.org/jira/browse/FLINK-30116
>>
>


Re: [VOTE] FLIP-275: Support Remote SQL Client Based on SQL Gateway

2022-12-20 Thread godfrey he
+1 (binding)

Best,
Godfrey

Hang Ruan  于2022年12月21日周三 15:21写道:
>
> +1 (non-binding)
>
> Best,
> Hang
>
> Paul Lam  于2022年12月20日周二 17:36写道:
>
> > +1 (non-binding)
> >
> > Best,
> > Paul Lam
> >
> > > 2022年12月20日 11:35,Shengkai Fang  写道:
> > >
> > > +1(binding)
> > >
> > > Best,
> > > Shengkai
> > >
> > > yu zelin  于2022年12月14日周三 20:41写道:
> > >
> > >> Hi, all,
> > >>
> > >> Thanks for all your feedbacks so far. Through the discussion on this
> > >> thread[1], I think we have came to a consensus, so I’d like to start a
> > >> vote on FLIP-275[2].
> > >>
> > >> The vote will last for at least 72 hours (Dec 19th, 13:00 GMT, excluding
> > >> weekend days) unless there is an objection or insufficient vote.
> > >>
> > >> Best,
> > >> Yu Zelin
> > >>
> > >> [1] https://lists.apache.org/thread/zpx64l0z91b0sz0scv77h0g13ptj4xxo
> > >> [2] https://cwiki.apache.org/confluence/x/T48ODg
> >
> >


Re: [DISCUSS] FLIP-280: Introduce a new explain mode to provide SQL advice

2023-01-02 Thread godfrey he
Thanks for driving this discussion.

Do we really need to expose `PlanAnalyzerFactory` as public interface?
I prefer we only expose ExplainDetail#ANALYZED_PHYSICAL_PLAN and the
analyzed result.
Which is enough for users and consistent with the results of `explain` method.

The classes about plan analyzer are in table planner module, which
does not public api
(public interfaces should be defined in flink-table-api-java module).
And PlanAnalyzer is depend on RelNode, which is internal class of
planner, and not expose to users.

Bests,
Godfrey


Shengkai Fang  于2023年1月3日周二 13:43写道:
>
> Sorry for the missing answer about the configuration of the Analyzer. Users
> may don't need to configure this with SQL statements. In the SQL Gateway,
> users can configure the endpoints with the option `sql-gateway.endpoint.type`
> in the flink-conf.
>
> Best,
> Shengkai
>
> Shengkai Fang  于2023年1月3日周二 12:26写道:
>
> > Hi, Jane.
> >
> > Thanks for bringing this to the discussion. I have some questions about
> > the FLIP:
> >
> > 1. `PlanAnalyzer#analyze` uses the FlinkRelNode as the input. Could you
> > share some thoughts about the motivation? In my experience, users mainly
> > care about 2 things when they develop their job:
> >
> > a. Why their SQL can not work? For example, their streaming SQL contains
> > an OVER window but their ORDER key is not ROWTIME. In this case, we may
> > don't have a physical node or logical node because, during the
> > optimization, the planner has already thrown the exception.
> >
> > b. Many users care about whether their state is compatible after upgrading
> > their Flink version. In this case, I think the old execplan and the SQL
> > statement are the user's input.
> >
> > So, I think we should introduce methods like `PlanAnalyzer#analyze(String
> > sql)` and `PlanAnalyzer#analyze(String sql, ExecnodeGraph)` here.
> >
> > 2. I am just curious how other people add the rules to the Advisor. When
> > rules increases, all these rules should be added to the Flink codebase?
> > 3. How do users configure another advisor?
> >
> > Best,
> > Shengkai
> >
> >
> >
> > Jane Chan  于2022年12月28日周三 12:30写道:
> >
> >> Hi @yuxia, Thank you for reviewing the FLIP and raising questions.
> >>
> >> 1: Is the PlanAnalyzerFactory also expected to be implemented by users
> >> just
> >> > like DynamicTableSourceFactory or other factories? If so, I notice that
> >> in
> >> > the code of PlanAnalyzerManager#registerAnalyzers, the code is as
> >> follows:
> >> > FactoryUtil.discoverFactory(classLoader, PlanAnalyzerFactory.class,
> >> > StreamPlanAnalyzerFactory.STREAM_IDENTIFIER)); IIUC, it'll always find
> >> the
> >> > factory with the name StreamPlanAnalyzerFactory.STREAM_IDENTIFIER; Is
> >> it a
> >> > typo or by design ?
> >>
> >>
> >> This is a really good open question. For the short answer, yes, it is by
> >> design. I'll explain the consideration in more detail.
> >>
> >> The standard procedure to create a custom table source/sink is to
> >> implement
> >> the factory and the source/sink class. There is a strong 1v1 relationship
> >> between the factory and the source/sink.
> >>
> >> SQL
> >>
> >> DynamicTableSourceFactory
> >>
> >> Source
> >>
> >> create table … with (‘connector’ = ‘foo’)
> >>
> >> #factoryIdentifer.equals(“foo”)
> >>
> >> FooTableSource
> >>
> >>
> >> *Apart from that, the custom function module is another kind of
> >> implementation. The factory creates a collection of functions. This is a
> >> 1vN relationship between the factory and the functions.*
> >>
> >> SQL
> >>
> >> ModuleFactory
> >>
> >> Function
> >>
> >> load module ‘bar’
> >>
> >> #factoryIdentifier.equals(“bar”)
> >>
> >> A collection of functions
> >>
> >> Back to the plan analyzers, if we choose the first style, we also need to
> >> expose a new SQL syntax to users, like "CREATE ANALYZER foo WITH ..." to
> >> specify the factory identifier. But I think it is too heavy because an
> >> analyzer is an auxiliary tool to help users write better queries, and thus
> >> it should be exposed at the API level other than the user syntax level.
> >>
> >> As a result, I propose to follow the second style. Then we don't need to
> >> introduce new syntax to create analyzers. Let StreamPlanAnalyzerFactory be
> >> the default factory to create analyzers under the streaming mode, and the
> >> custom analyzers will register themselves in StreamPlanAnalyzerFactory.
> >>
> >> @Override
> >> public List createAnalyzers() {
> >> return Arrays.asList(
> >> FooAnalyzer.INSTANCE,
> >> BarAnalyzer.INSTANCE,
> >> ...);
> >> }
> >>
> >>
> >> 2: Is there any special reason make PlanAdvice be a final class? Would it
> >> > be better to make it an interface and we provide a default
> >> implementation?
> >> > My concern is some users may want have their own implementation for
> >> > PlanAdvice. But it may be overthinking. If you think it won't bring any
> >> > problem, I'm also fine with that.
> >>
> >>
> >> The reason 

Re: [DISCUSS] Adding a option for planner to decide which join reorder rule to choose

2023-01-08 Thread godfrey he
Hi Yunhong,

Thanks for driving this discuss!

This option looks good to me,
and looking forward to contributing this rule back to Apache Calcite.

Best,
Godfrey



yh z  于2023年1月5日周四 15:32写道:
>
> Hi Benchao,
>
> Thanks for your reply.
>
> Since our existing test results are based on multiple performance
> optimization points on the TPC-DS benchmark[1][2], we haven't separately
> tested the performance improvement brought by new bushy join reorder
> rule. I will complete this test recently and update the results to this
> email.
>
> I am very happy to contribute to Calcite. Later, I will push the PR of the
> bushy join reorder rule to Calcite.
>
> [1] https://issues.apache.org/jira/browse/FLINK-27583
> [2] https://issues.apache.org/jira/browse/FLINK-29942
>
> Best regards,
> Yunhong Zheng
>
> Benchao Li  于2023年1月4日周三 19:03写道:
>
> > Hi Yunhong,
> >
> > Thanks for the updating. And introducing the new bushy join reorder
> > algorithm would be great. And I also agree with the newly added config
> > option "table.optimizer.bushy-join-reorder-threshold" and 12 as the default
> > value.
> >
> >
> > > As for optimization
> > > latency, this is the problem to be solved by the parameters to be
> > > introduced in this discussion. When there are many tables need to be
> > > reordered, the optimization latency will increase greatly. But when the
> > > table numbers less than the threshold, the latency is the same as the
> > > LoptOptimizeJoinRule.
> >
> >
> > This sounds great. If possible, could you share more numbers to us? E.g.,
> > what's the latency of optimization when there are 11/12 tables for both
> > approach?
> >
> >  For question #3: The implementation of Calcite MultiJoinOptimizeBushyRule
> > > is very simple, and it will not store the intermediate results at all.
> > So,
> > > the implementation of Calcite cannot get all possible join reorder
> > results
> > > and it cannot combine with the current cost model to get more reasonable
> > > join reorder results.
> >
> >
> > It's ok to do it in Flink as the first step. It would be great to also
> > contribute it to Calcite later if possible, it depends on you.
> >
> > yh z  于2023年1月3日周二 15:27写道:
> >
> > > Hi Benchao,
> > >
> > > Thanks for your reply.
> > >
> > > Actually,  I mistakenly wrote the name "bushy join reorder" to "busy join
> > > reorder". I'm sorry for the trouble bring to you. "Bushy join reorder"
> > > means we can build a bushy join tree based on cost model, but now Flink
> > can
> > > only build a left-deep tree using Calcite LoptOptimizeJoinRule. I hope my
> > > answers can help you solve the following questions:
> > >
> > > For question #1: The biggest advantage of this "bushy join reorder"
> > > strategy over the default Flink left-deep tree strategy is that it can
> > > retail all possible join reorder plans, and then select the optimal plan
> > > according to the cost model. This means that the busy join reorder
> > strategy
> > > can be better combined with the current cost model to get more reasonable
> > > join reorder results. We verified it on the TPC-DS benchmark, with the
> > > spark plan as a reference, the new busy join reorder strategy can make
> > more
> > > TPC-DS query plans be adjusted to be consistent with the Spark plan, and
> > > the execution time is signifcantly reduced.  As for optimization
> > > latency, this is the problem to be solved by the parameters to be
> > > introduced in this discussion. When there are many tables need to be
> > > reordered, the optimization latency will increase greatly. But when the
> > > table numbers less than the threshold, the latency is the same as the
> > > LoptOptimizeJoinRule.
> > >
> > > For question #2: According to my research, many compute or database
> > systems
> > > have the "bushy join reorder" strategies based on dynamic programming.
> > For
> > > example, Spark and PostgresSql use the same strategy, and the threshold
> > be
> > > set to 12. Also, some papers, like [1] and [2], have also researched this
> > > strategy, and [2] set the threshold to 14.
> > >
> > > For question #3: The implementation of Calcite MultiJoinOptimizeBushyRule
> > > is very simple, and it will not store the intermediate results at all.
> > So,
> > > the implementation of Calcite cannot get all possible join reorder
> > results
> > > and it cannot combine with the current cost model to get more reasonable
> > > join reorder results.
> > >
> > >
> > > [1]
> > >
> > >
> > https://courses.cs.duke.edu/compsci516/cps216/spring03/papers/selinger-etal-1979.pdf
> > > [2] https://db.in.tum.de/~radke/papers/hugejoins.pdf
> > >
> > >
> > >
> > > Benchao Li  于2023年1月3日周二 12:54写道:
> > >
> > > > Hi Yunhong,
> > > >
> > > > Thanks for driving this~
> > > >
> > > > I haven't gone deep into the implementation details yet. Regarding the
> > > > general description, I would ask a few questions firstly:
> > > >
> > > > #1, Is there any benchmark results about the optimization latency
> > change
> > > > compared to c

Re: [VOTE] FLIP-280: Introduce EXPLAIN PLAN_ADVICE to provide SQL advice

2023-01-09 Thread godfrey he
+1 (binding)

Best,
Godfrey

Jingsong Li  于2023年1月10日周二 09:56写道:
>
> +1 (binding)
>
> On Mon, Jan 9, 2023 at 6:19 PM Jane Chan  wrote:
> >
> > Hi all,
> >
> > Thanks for all the feedback so far.
> > Based on the discussion[1], we have come to a consensus, so I would like to
> > start a vote on FLIP-280: Introduce EXPLAIN PLAN_ADVICE to provide SQL
> > advice[2].
> >
> > The vote will last for at least 72 hours (Jan 12th at 10:00 GMT)
> > unless there is an objection or insufficient votes.
> >
> > [1] https://lists.apache.org/thread/5xywxv7g43byoh0jbx1b6qo6gx6wjkcz
> > [2]
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-280%3A+Introduce+EXPLAIN+PLAN_ADVICE+to+provide+SQL+advice
> >
> > Best,
> > Jane Chan


Re: [VOTE] FLIP-279 Unified the max display column width for SqlClient and Table APi in both Streaming and Batch execMode

2023-01-16 Thread godfrey he
+1 (binding)

Best,
Godfrey

Shammon FY  于2023年1月12日周四 23:27写道:
>
> +1 (no-binding)
>
>
> Best,
> Shammon
>
> On Thu, Jan 12, 2023 at 8:11 PM Shengkai Fang  wrote:
>
> > +1(binding)
> >
> > Best,
> > Shengkai
> >
> > Jark Wu  于2023年1月12日周四 19:22写道:
> >
> > > +1 (binding)
> > > Thank you for driving this effort.
> > >
> > > Best,
> > > Jark
> > >
> > > > 2023年1月9日 15:46,Jing Ge  写道:
> > > >
> > > > Hi,
> > > >
> > > > I'd like to start a vote on FLIP-279 Unified the max display column
> > width
> > > > for SqlClient and Table APi in both Streaming and Batch execMode. The
> > > > discussion can be found at [1].
> > > >
> > > > The vote will last for at least 72 hours (Jan 12th at 9:00 GMT) unless
> > > > there is an objection or insufficient votes.
> > > >
> > > > [1] https://lists.apache.org/thread/f9p622k8cgcjl0r0b44np5wm8krhtjjz
> > > > [2]
> > > >
> > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-279+Unified+the+max+display+column+width+for+SqlClient+and+Table+APi+in+both+Streaming+and+Batch+execMode
> > > >
> > > > Best regards,
> > > > Jing
> > >
> > >
> >


[ANNOUNCE] New Apache Flink Committer - Jing Ge

2023-02-13 Thread godfrey he
Hi everyone,

On behalf of the PMC, I'm very happy to announce Jing Ge as a new Flink
committer.

Jing has been consistently contributing to the project for over 1 year.
He authored more than 50 PRs and reviewed more than 40 PRs
with mainly focus on connector, test, and document modules.
He was very active on the mailing list (more than 90 threads) last year,
which includes participating in a lot of dev discussions (30+),
providing many effective suggestions for FLIPs and answering
many user questions. He was the Flink Forward 2022 keynote speaker
to help promote Flink and  a trainer for Flink troubleshooting and performance
tuning of Flink Forward 2022 training program.

Please join me in congratulating Jing for becoming a Flink committer!

Best,
Godfrey


Re: [External] [DISCUSS] FLIP-292: Support configuring state TTL at operator level for Table API & SQL programs

2023-04-03 Thread godfrey he
Hi Jane,

Thanks for driving this FLIP.

I think the compiled plan solution and the hint solution do not
conflict, the two can exist at the same time.
The compiled plan solution can address the need of advanced users and
the platform users
which all stateful operators' state TTL can be defined by user. While
the hint solution can address some
 specific simple scenarios, which is very user-friendly, convenient,
and unambiguous to use.

Some stateful operators are not compiled from SQL directly, such as
ChangelogNormalize and
SinkUpsertMaterializer mentioned above,  I notice the the example given by Yisha
has hints propagation problem which does not conform to the current design.
The rough idea about the hint solution should be simple (only the
common operators are supported)
and easy to understand (no hints propagation).

If the hint solution is supported, a compiled plan which is from a
query with state TTL hints
 can also be further modified for the state TTL parts.

So, I prefer the hint solution to be discuss in a separate FLIP.  I
think that FLIP maybe
need a lot discussion.

Best,
Godfrey

周伊莎  于2023年3月30日周四 22:04写道:
>
> Hi Jane,
>
> Thanks for your detailed response.
>
> You mentioned that there are 10k+ SQL jobs in your production
> > environment, but only ~100 jobs' migration involves plan editing. Is 10k+
> > the number of total jobs, or the number of jobs that use stateful
> > computation and need state migration?
> >
>
> 10k is the number of SQL jobs that enable periodic checkpoint. And
> surely if users change their sql which result in changes of the plan, they
> need to do state migration.
>
> - You mentioned that "A truth that can not be ignored is that users
> > usually tend to give up editing TTL(or operator ID in our case) instead of
> > migrating this configuration between their versions of one given job." So
> > what would users prefer to do if they're reluctant to edit the operator
> > ID? Would they submit the same SQL as a new job with a higher version to
> > re-accumulating the state from the earliest offset?
>
>
> You're exactly right. People will tend to re-accumulate the state from a
> given offset by changing the namespace of their checkpoint.
> Namespace is an internal concept and restarting the sql job in a new
> namespace can be simply understood as submitting a new job.
>
> Back to your suggestions, I noticed that FLIP-190 [3] proposed the
> > following syntax to perform plan migration
>
>
> The 'plan migration'  I said in my last reply may be inaccurate.  It's more
> like 'query evolution'. In other word, if a user submitted a sql job with a
> configured compiled plan, and then
> he changes the sql,  the compiled plan changes too, how to move the
> configuration in the old plan to the new plan.
> IIUC, FLIP-190 aims to solve issues in flink version upgrades and leave out
> the 'query evolution' which is a fundamental change to the query. E.g.
> adding a filter condition, a different aggregation.
> And I'm really looking forward to a solution for query evolution.
>
> And I'm also curious about how to use the hint
> > approach to cover cases like
> >
> > - configuring TTL for operators like ChangelogNormalize,
> > SinkUpsertMaterializer, etc., these operators are derived by the planner
> > implicitly
> > - cope with two/multiple input stream operator's state TTL, like join,
> > and other operations like row_number, rank, correlate, etc.
>
>
>  Actually, in our company , we make operators in the query block where the
> hint locates all affected by that hint. For example,
>
> INSERT INTO sink
> > SELECT /*+ STATE_TTL('1D') */
> >id,
> >name,
> >num
> > FROM (
> >SELECT
> >*,
> >ROW_NUMBER() OVER (PARTITION BY id ORDER BY num DESC) as row_num
> >FROM (
> >SELECT
> >*
> >FROM (
> >SELECT
> >id,
> >name,
> >max(num) as num
> >FROM source1
> >GROUP BY
> >id, name, TUMBLE(proc, INTERVAL '1' MINUTE)
> >)
> >GROUP BY
> >id, name, num
> >)
> > )
> > WHERE row_num = 1
> >
>
> In the SQL above, the state TTL of Rank and Agg will be all configured as 1
> day.  If users want to set different TTL for Rank and Agg, they can just
> make these two queries located in two different query blocks.
> It looks quite rough but straightforward enough.  For each side of join
> operator, one of my users proposed a syntax like below:
>
> > /*+ 
> > JOIN_TTL('tables'='left_talbe,right_table','left_ttl'='10','right_ttl'='1')
> >  */
> >
> > We haven't accepted this proposal now, maybe we could find some better
> design for this kind of case. Just for your information.
>
> I think if we want to utilize hints to support fine-grained configuration,
> we can open a new FLIP to discuss it.
> BTW, personally, I'm interested in how to design a graphical interface to
> help users to maintain their custom fine-grain

Re: [VOTE] FLIP-292: Enhance COMPILED PLAN to support operator-level state TTL configuration

2023-04-10 Thread godfrey he
+1 (binding)

Best,
Godfrey

Jing Ge  于2023年4月10日周一 18:42写道:
>
> +1 (binding)
>
> Best Regards,
> Jing
>
> On Mon, Apr 10, 2023 at 12:27 PM Lincoln Lee  wrote:
>
> > +1 (binding)
> >
> > Best,
> > Lincoln Lee
> >
> >
> > Jane Chan  于2023年4月10日周一 18:06写道:
> >
> > > Hi developers,
> > >
> > > Thanks for all the feedback on FLIP-292: Enhance COMPILED PLAN to support
> > > operator-level state TTL configuration [1].
> > > Based on the discussion [2], we have come to a consensus, so I would like
> > > to start a vote.
> > >
> > > The vote will last for at least 72 hours (Apr. 13th at 10:00 A.M. GMT)
> > > unless there is an objection or insufficient votes.
> > >
> > > [1]
> > >
> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=240883951
> > > [2] https://lists.apache.org/thread/ffmc96gv8ofoskbxlhtm7w8oxv8nqzct
> > >
> > > Best,
> > > Jane Chan
> > >
> >


Re: [ANNOUNCE] New PMC member: Yuan Mei

2022-03-14 Thread godfrey he
Congratulations, Yuan!

Best,
Godfrey

Lijie Wang  于2022年3月15日周二 09:18写道:
>
> Congratulations, Yuan!
>
> Best,
> Lijie
>
> Benchao Li  于2022年3月15日周二 08:18写道:
>
> > Congratulations, Yuan!
> >
> > Yun Gao  于2022年3月15日周二 01:37写道:
> >
> > > Congratulations, Yuan!
> > >
> > > Best,
> > > Yun Gao
> > >
> > >
> > >
> > > --
> > > From:Francesco Guardiani 
> > > Send Time:2022 Mar. 15 (Tue.) 00:21
> > > To:dev 
> > > Subject:Re: [ANNOUNCE] New PMC member: Yuan Mei
> > >
> > > Congratulations, Yuan!
> > >
> > > On Mon, Mar 14, 2022 at 3:51 PM yanfei lei  wrote:
> > >
> > > > Congratulations, Yuan!
> > > >
> > > >
> > > >
> > > > Zhilong Hong  于2022年3月14日周一 19:31写道:
> > > >
> > > > > Congratulations, Yuan!
> > > > >
> > > > > Best,
> > > > > Zhilong
> > > > >
> > > > > On Mon, Mar 14, 2022 at 7:22 PM Konstantin Knauf 
> > > > > wrote:
> > > > >
> > > > > > Congratulations, Yuan!
> > > > > >
> > > > > > On Mon, Mar 14, 2022 at 11:29 AM Jing Zhang 
> > > > > wrote:
> > > > > >
> > > > > > > Congratulations, Yuan!
> > > > > > >
> > > > > > > Best,
> > > > > > > Jing Zhang
> > > > > > >
> > > > > > > Jing Ge  于2022年3月14日周一 18:15写道:
> > > > > > >
> > > > > > > > Congrats! Very well deserved!
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Jing
> > > > > > > >
> > > > > > > > On Mon, Mar 14, 2022 at 10:34 AM Piotr Nowojski <
> > > > > pnowoj...@apache.org>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Congratulations :)
> > > > > > > > >
> > > > > > > > > pon., 14 mar 2022 o 09:59 Yun Tang 
> > > > napisał(a):
> > > > > > > > >
> > > > > > > > > > Congratulations, Yuan!
> > > > > > > > > >
> > > > > > > > > > Best,
> > > > > > > > > > Yun Tang
> > > > > > > > > > 
> > > > > > > > > > From: Zakelly Lan 
> > > > > > > > > > Sent: Monday, March 14, 2022 16:55
> > > > > > > > > > To: dev@flink.apache.org 
> > > > > > > > > > Subject: Re: [ANNOUNCE] New PMC member: Yuan Mei
> > > > > > > > > >
> > > > > > > > > > Congratulations, Yuan!
> > > > > > > > > >
> > > > > > > > > > Best,
> > > > > > > > > > Zakelly
> > > > > > > > > >
> > > > > > > > > > On Mon, Mar 14, 2022 at 4:49 PM Johannes Moser <
> > > > > j...@ververica.com>
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Congrats Yuan.
> > > > > > > > > > >
> > > > > > > > > > > > On 14.03.2022, at 09:45, Arvid Heise  > >
> > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > Congratulations and well deserved!
> > > > > > > > > > > >
> > > > > > > > > > > > On Mon, Mar 14, 2022 at 9:30 AM Matthias Pohl <
> > > > > > map...@apache.org
> > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > >> Congratulations, Yuan.
> > > > > > > > > > > >>
> > > > > > > > > > > >> On Mon, Mar 14, 2022 at 9:25 AM Shuo Cheng <
> > > > > > njucs...@gmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > > > >>
> > > > > > > > > > > >>> Congratulations, Yuan!
> > > > > > > > > > > >>>
> > > > > > > > > > > >>> On Mon, Mar 14, 2022 at 4:22 PM Anton Kalashnikov <
> > > > > > > > > > kaa@yandex.com>
> > > > > > > > > > > >>> wrote:
> > > > > > > > > > > >>>
> > > > > > > > > > >  Congratulations, Yuan!
> > > > > > > > > > > 
> > > > > > > > > > >  --
> > > > > > > > > > > 
> > > > > > > > > > >  Best regards,
> > > > > > > > > > >  Anton Kalashnikov
> > > > > > > > > > > 
> > > > > > > > > > >  14.03.2022 09:13, Leonard Xu пишет:
> > > > > > > > > > > > Congratulations Yuan!
> > > > > > > > > > > >
> > > > > > > > > > > > Best,
> > > > > > > > > > > > Leonard
> > > > > > > > > > > >
> > > > > > > > > > > >> 2022年3月14日 下午4:09,Yangze Guo 
> > > 写道:
> > > > > > > > > > > >>
> > > > > > > > > > > >> Congratulations!
> > > > > > > > > > > >>
> > > > > > > > > > > >> Best,
> > > > > > > > > > > >> Yangze Guo
> > > > > > > > > > > >>
> > > > > > > > > > > >> On Mon, Mar 14, 2022 at 4:08 PM Martijn Visser <
> > > > > > > > > > >  martijnvis...@apache.org> wrote:
> > > > > > > > > > > >>> Congratulations Yuan!
> > > > > > > > > > > >>>
> > > > > > > > > > > >>> On Mon, 14 Mar 2022 at 09:02, Yu Li <
> > > > car...@gmail.com>
> > > > > > > > wrote:
> > > > > > > > > > > >>>
> > > > > > > > > > >  Hi all!
> > > > > > > > > > > 
> > > > > > > > > > >  I'm very happy to announce that Yuan Mei has
> > > joined
> > > > > the
> > > > > > > > Flink
> > > > > > > > > > PMC!
> > > > > > > > > > > 
> > > > > > > > > > >  Yuan is helping the community a lot with
> > creating
> > > > and
> > > > > > > > > validating
> > > > > > > > > > >  releases,
> > > > > > > > > > >  contributing to FLIP discussions and good code
> > > > > > > contributions
> > > > > > > > > to
> > > > > > > > > > > >> the
> > > > > > > > > > >  state backend and related components.
> > > > > > > > > > > 
> 

Re: [VOTE] FLIP-214: Support Advanced Function DDL

2022-04-21 Thread godfrey he
hi Ron,

I don't see any section mentioned `delete jar`, could you update it?

Best,
Godfrey

Jing Zhang  于2022年4月21日周四 17:57写道:
>
> Ron,
> +1 (binding)
>
> Thanks for driving this FLIP.
>
> Best,
> Jing Zhang
>
> Jark Wu  于2022年4月21日周四 11:31写道:
>
> > Thanks for driving this work @Ron,
> >
> > +1 (binding)
> >
> > Best,
> > Jark
> >
> > On Thu, 21 Apr 2022 at 10:42, Mang Zhang  wrote:
> >
> > > +1
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > --
> > >
> > > Best regards,
> > > Mang Zhang
> > >
> > >
> > >
> > >
> > >
> > > At 2022-04-20 18:28:28, "刘大龙"  wrote:
> > > >Hi, everyone
> > > >
> > > >
> > > >
> > > >
> > > >I'd like to start a vote on FLIP-214: Support Advanced Function DDL [1]
> > > which has been discussed in [2].
> > > >
> > > >The vote will be open for at least 72 hours unless there is an objection
> > > or not enough votes.
> > > >
> > > >
> > > >
> > > >
> > > >[1]
> > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-214+Support+Advanced+Function+DDL
> > > >
> > > >[2] https://lists.apache.org/thread/7m5md150qgodgz1wllp5plx15j1nowx8
> > > >
> > > >
> > > >
> > > >
> > > >Best,
> > > >
> > > >Ron
> > >
> >


Re: Re: [VOTE] FLIP-214: Support Advanced Function DDL

2022-04-21 Thread godfrey he
Hi, Ron

Thanks for the explanation,
+1 (binding) from my side

Best,
Godfrey

刘大龙  于2022年4月22日周五 13:45写道:
>
>
> Hi, godfrey
>
> The add/delete jar syntax parse is supported in table environment side 
> currently, but the execution is implemented in SqlClient side. After this 
> FLIP, we will move the execution to table environment, so here is no public 
> api change. Moreover, I have updated the description in Core Code Design 
> section.
>
> > -原始邮件-
> > 发件人: "godfrey he" 
> > 发送时间: 2022-04-22 12:26:44 (星期五)
> > 收件人: dev 
> > 抄送:
> > 主题: Re: [VOTE] FLIP-214: Support Advanced Function DDL
> >
> > hi Ron,
> >
> > I don't see any section mentioned `delete jar`, could you update it?
> >
> > Best,
> > Godfrey
> >
> > Jing Zhang  于2022年4月21日周四 17:57写道:
> > >
> > > Ron,
> > > +1 (binding)
> > >
> > > Thanks for driving this FLIP.
> > >
> > > Best,
> > > Jing Zhang
> > >
> > > Jark Wu  于2022年4月21日周四 11:31写道:
> > >
> > > > Thanks for driving this work @Ron,
> > > >
> > > > +1 (binding)
> > > >
> > > > Best,
> > > > Jark
> > > >
> > > > On Thu, 21 Apr 2022 at 10:42, Mang Zhang  wrote:
> > > >
> > > > > +1
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > Best regards,
> > > > > Mang Zhang
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > At 2022-04-20 18:28:28, "刘大龙"  wrote:
> > > > > >Hi, everyone
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >I'd like to start a vote on FLIP-214: Support Advanced Function DDL 
> > > > > >[1]
> > > > > which has been discussed in [2].
> > > > > >
> > > > > >The vote will be open for at least 72 hours unless there is an 
> > > > > >objection
> > > > > or not enough votes.
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >[1]
> > > > >
> > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-214+Support+Advanced+Function+DDL
> > > > > >
> > > > > >[2] https://lists.apache.org/thread/7m5md150qgodgz1wllp5plx15j1nowx8
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >Best,
> > > > > >
> > > > > >Ron
> > > > >
> > > >
>
>
> --
> Best,
> Ron


[DISCUSS] Maintain a Calcite repository for Flink to accelerate the development for Flink SQL features

2022-04-22 Thread godfrey he
Dear devs,

I would like to open a discussion on the fact that currently many
Flink SQL function
 development relies on Calcite releases, which seriously blocks some
Flink SQL's features release.
Therefore, I would like to discuss whether it is possible to solve this problem
by creating Flink's own Calcite repository.

Currently, Flink depends on Caclite-1.26, FLIP-204[1] relies on Calcite-1.30,
and we recently want to support fully join-hints functionatity in Flink-1.16,
which relies on Calcite-1.31 (maybe two or three months later will be released).

In order to support some new features or fix some bugs, we need to upgrade
the Calcite version, but every time we upgrade Calcite version
(especially upgrades
across multiple versions), the processing is very tough: I remember clearly that
 the Calcite upgrade from 1.22 to 1.26 took two weeks of full-time to complete.

Currently, in order to fix some bugs while not upgrading the Calcite version,
we copy the corresponding Calcite class directly into the Flink project
and then modify it accordingly.[2] This approach is rather hacky and
hard for code maintenance and upgrades.

So, I had an idea whether we could solve this problem by maintaining a
Calcite repository
in the Flink community. This approach has been practiced within my
company for many years.
 There are similar practices in the industry. For example, Apache Dill
also maintains
a separate Calcite repository[3].

The following is a brief analysis of the approach and the pros and
cons of maintaining a separate repository.

Approach:
1. Where to put the code? https://github.com/flink-extended is a good place.
2. What extra code can be added to this repository? Only bug fixes and features
that are already merged into Calcite can be cherry-picked to this repository.
We also should try to push bug fixes to the Calcite community.
Btw, the copied Calcite class in the Flink project can be removed.
3. How to upgrade the Calcite version? Check out the target Calcite
release branch
and rebase our bug fix code. (As we upgrade, we will maintain fewer
and fewer older bug
fixes code.) And then, verify all Calcte's tests and Flink's tests in
the developer's local
 environment. If all tests are OK, release the Calcite branch, or fix
it in the branch and re-test.
 After the branch is released, then the version of Calcite in Flink
can be upgraded. For example:
 checkout calcite-1.26.0-flink-v1-SNAPSHOT branch from calcite-1.26.0,
move all the copied
 Calcite code in Flink to the branch, and pick all the hint related
changes from Calcite-1.31 to
 the branch. Then we can change the Calcite version in Flink to
calcite-1.26.0-flink-v1-SNAPSHOT,
and verify all tests in the locale. Release calcite-1.26.0-flink-v1
after all tests are successful.
At last upgrade the calcite version to
calcite-1.26.0-flink-v10-flink-v1, and open a PR.
4. Who will maintain it? The maintenance workload is minimal, but the
upgrade work is
 laborious (actually, it's similar to before). I can maintain it in
the early stage and standardise the processing.

Pros.
1. The release of Flink is decoupled from the release of Calcite,
 making feature development and bug fix quicker
2. Reduce the hassle of unnecessary calcite upgrades
3. No hacking in Flink to maintain the Calcite copied code

cons.
1. Need to maintain an additional Calcite repository
2. The Upgrades are a little more complicated than before

Any feedback is very welcome!


[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-204%3A+Introduce+Hash+Lookup+Join
[2] 
https://github.com/apache/flink/tree/master/flink-table/flink-table-planner/src/main/java/org/apache/calcite
[3] https://github.com/apache/drill/blob/master/pom.xml#L64

Best,
Godfrey


Re: [DISCUSS] Maintain a Calcite repository for Flink to accelerate the development for Flink SQL features

2022-04-22 Thread godfrey he
Thanks for the feedback, guys!

For Jingsong's feedback:
>## Do we have the plan to upgrade calcite to 1.31?

I think we will upgrade Calcite to 1.31 only when Flink depends on
some significant features of Calcite.
 Such as: new syntax PTF (CALCITE-4865).

 >## Is Cherry-pick costly?
>From the experience of maintaining calcite with our company, the cost is small.
We only cherry-pick the bug fixes and needed minor features.
For a major feature, we can choose to upgrade the version.

>## Are the calcite repository costly to maintain?
>From the experience of @Dann y chen (One PMC of Calcite), publishing
is much easier.


For Chesnay's feedback:
I also totally agree that a fork repository will increase the cost of
maintenance.

Usually, the Calcite community releases a version three months or more.
I think it's hard to let Calcite change the release cycle
because Calcite supports many compute engines.


For Konstantin's feedback:
Some changes in Calcite may cause hundreds of plan changes in Flink,
such as: CALCITE-4173.
We must check whether the change is expected, whether there is
performance regression.
Some of the changes are very subtle, especially in the CBO planner.
This situation also occurs similarly within upgrading from 1.1x to 1.22.
If you are not familiar with Flink planner and Calcite, it will be
more difficult to upgrade.


For Xintong's feedback:
You are right, I will connect Yun for some help, Thanks for the suggestions.


For Martijn's feedback:
I'm also against cherry-pick many features code into the fock repository,
and I also totally agree we should collaborate closely with the
Calcite community.
I'm just trying to find an approach which can avoid frequent Calcite upgrades,
but easily support bug fix and minor new feature development.

As for the CALCITE-4865 case, I think we should upgrade the Calcite
version to support PTF.

@Jing zhang, can you share some 'feeling' for CALCITE-4865 ?

Best,
Godfrey

Martijn Visser  于2022年4月22日周五 17:31写道:
>
> Hi everyone,
>
> Overall I'm against the idea of setting up a Calcite fork for the same
> reasons that Chesnay has mentioned. We've talked extensively about doing an
> upgrade of Calcite during the Flink 1.15 release period, but there was a
> lot of pushback by the maintainers against that because of the required
> efforts. Having our own fork will mean that there will be even more effort
> required, because not only do we need to perform the upgrade on Flink's
> end, we also need to maintain this Calcite fork.
>
> I think what we should do is have a closer collaboration with the Calcite
> community and see if we can also help out with reviewing/merging PRs and
> more frequent releases. What we're seeing is that already features that are
> proposed towards Calcite because we need them for Flink, are not getting
> picked up by the Calcite community. See
> https://issues.apache.org/jira/browse/CALCITE-4865 /
> https://github.com/apache/calcite/pull/2606 which is such an example.
>
> I would rather invest more in collaborating with the Calcite community
> instead of maintaining our own fork. I believe that would help us get new
> features and bug fixes sooner.
>
> Best regards,
>
> Martijn Visser
> https://twitter.com/MartijnVisser82
> https://github.com/MartijnVisser
>
>
> On Fri, 22 Apr 2022 at 10:46, Xintong Song  wrote:
>
> > BTW, I think this proposal sounds similar to FRocksDB, the Flink's custom
> > RocksDB build. Maybe folks maintaining FRocksDB can share some experiences.
> >
> > CC @Yun Tang
> >
> > Thank you~
> >
> > Xintong Song
> >
> >
> >
> > On Fri, Apr 22, 2022 at 4:35 PM Xintong Song 
> > wrote:
> >
> > > Hi Godfrey,
> > >
> > >
> > >> 1. Where to put the code? https://github.com/flink-extended is a good
> > >> place.
> > >
> > >
> > > Please notice that `flink-extended` is not endorsed by the Apache Flink
> > > PMC. That means if the proposed new Calcite repository is hosted there,
> > the
> > > maintenance and release will not be guaranteed by the Apache Flink
> > project.
> > > I guess the question is do we consider another 3rd party Calcite
> > repository
> > > more reliable and convenient than the official Apache Calcite that we
> > want
> > > to depend on.
> > >
> > > Thank you~
> > >
> > > Xintong Song
> > >
> > >
> > >
> > > On Fri, Apr 22, 2022 at 4:07 PM Chesnay Schepler 
> > > wrote:
> > >
> > >> I'm overall against the idea of creating a fork.
> > >> It implies quite some maintenance overhead, like

Re: [DISCUSS] Maintain a Calcite repository for Flink to accelerate the development for Flink SQL features

2022-04-22 Thread godfrey he
Hi Chesnay,

There is no bug fix version until now.
You can find the releases in https://github.com/apache/calcite/tags

Best,
Godfrey

Chesnay Schepler  于2022年4月22日周五 18:48写道:
>
> I find it a bit weird that the supposed only way to get a bug fix is to
> do a big version upgrade.
> Is Calcite not creating bugfix releases?
>
> On 22/04/2022 12:26, godfrey he wrote:
> > Thanks for the feedback, guys!
> >
> > For Jingsong's feedback:
> >> ## Do we have the plan to upgrade calcite to 1.31?
> > I think we will upgrade Calcite to 1.31 only when Flink depends on
> > some significant features of Calcite.
> >   Such as: new syntax PTF (CALCITE-4865).
> >
> >   >## Is Cherry-pick costly?
> > >From the experience of maintaining calcite with our company, the cost is 
> > >small.
> > We only cherry-pick the bug fixes and needed minor features.
> > For a major feature, we can choose to upgrade the version.
> >
> >> ## Are the calcite repository costly to maintain?
> > >From the experience of @Dann y chen (One PMC of Calcite), publishing
> > is much easier.
> >
> >
> > For Chesnay's feedback:
> > I also totally agree that a fork repository will increase the cost of
> > maintenance.
> >
> > Usually, the Calcite community releases a version three months or more.
> > I think it's hard to let Calcite change the release cycle
> > because Calcite supports many compute engines.
> >
> >
> > For Konstantin's feedback:
> > Some changes in Calcite may cause hundreds of plan changes in Flink,
> > such as: CALCITE-4173.
> > We must check whether the change is expected, whether there is
> > performance regression.
> > Some of the changes are very subtle, especially in the CBO planner.
> > This situation also occurs similarly within upgrading from 1.1x to 1.22.
> > If you are not familiar with Flink planner and Calcite, it will be
> > more difficult to upgrade.
> >
> >
> > For Xintong's feedback:
> > You are right, I will connect Yun for some help, Thanks for the suggestions.
> >
> >
> > For Martijn's feedback:
> > I'm also against cherry-pick many features code into the fock repository,
> > and I also totally agree we should collaborate closely with the
> > Calcite community.
> > I'm just trying to find an approach which can avoid frequent Calcite 
> > upgrades,
> > but easily support bug fix and minor new feature development.
> >
> > As for the CALCITE-4865 case, I think we should upgrade the Calcite
> > version to support PTF.
> >
> > @Jing zhang, can you share some 'feeling' for CALCITE-4865 ?
> >
> > Best,
> > Godfrey
> >
> > Martijn Visser  于2022年4月22日周五 17:31写道:
> >> Hi everyone,
> >>
> >> Overall I'm against the idea of setting up a Calcite fork for the same
> >> reasons that Chesnay has mentioned. We've talked extensively about doing an
> >> upgrade of Calcite during the Flink 1.15 release period, but there was a
> >> lot of pushback by the maintainers against that because of the required
> >> efforts. Having our own fork will mean that there will be even more effort
> >> required, because not only do we need to perform the upgrade on Flink's
> >> end, we also need to maintain this Calcite fork.
> >>
> >> I think what we should do is have a closer collaboration with the Calcite
> >> community and see if we can also help out with reviewing/merging PRs and
> >> more frequent releases. What we're seeing is that already features that are
> >> proposed towards Calcite because we need them for Flink, are not getting
> >> picked up by the Calcite community. See
> >> https://issues.apache.org/jira/browse/CALCITE-4865 /
> >> https://github.com/apache/calcite/pull/2606 which is such an example.
> >>
> >> I would rather invest more in collaborating with the Calcite community
> >> instead of maintaining our own fork. I believe that would help us get new
> >> features and bug fixes sooner.
> >>
> >> Best regards,
> >>
> >> Martijn Visser
> >> https://twitter.com/MartijnVisser82
> >> https://github.com/MartijnVisser
> >>
> >>
> >> On Fri, 22 Apr 2022 at 10:46, Xintong Song  wrote:
> >>
> >>> BTW, I think this proposal sounds similar to FRocksDB, the Flink's custom
> >>> RocksDB build. Maybe folks maintaining FRocksDB can share some 
> >>> experiences.
> >>&g

Re: [DISCUSS] Maintain a Calcite repository for Flink to accelerate the development for Flink SQL features

2022-04-24 Thread godfrey he
Hi, Jing
Thanks for sharing the Calcite experiences.
About Calcite version upgrading,  we should try not use the latest Calcite
version to avoid the bugs introduced by the new version if possible.
This may be a best practice.


Hi, Yun
Thanks for the detailed explanation for the experiences regarding FRocksDB.
I agree with you that the situation with Calcite and RocksDB is a
little difference.
The main pain point for Calcite is that we have to upgrade Calcite to
latest version
to get fix bugs and new features, but the latest version may be
unstable, which is a pain for us.
If we all agree we should maintain a forked Calcite repo,
there are many experiences we can learn from FRocksDB.

Best,
Godfrey

Yun Tang  于2022年4月24日周日 11:58写道:
>
> Hi all,
>
> I could share two cents here for how we maintain FRocksDB.
>
> First of all, we also do not prefer to maintain a customized RocksDB version 
> in Flink, which brings additional overhead for Flink community:
>
>
>   1.  RocksDB community switches to circleci for the CI tests after 
> RocksDB-6.x, which requires additional money to run all tests for reviewing 
> each PR.
>   2.  We need to compile and include all kinds of FRocksDB binaries on 
> linux32/64, windows, ppc64, ARM and Macos platforms, which is really tough 
> and boring experiences.
>
> The root reason why we have to maintain a forked RocksDB repo is that RocksDB 
> community refuses to accept a plugin-like feature based on compaction filter, 
> which is heavily dependent by Flink's state TTL feature [1]. From 
> RocksDB-7.0, the community also moves several components to the plugin repo 
> [2], although this cannot avoid us to release all kinds of binaries, it can 
> at least decrease our energy to maintain the whole tests if we follow this 
> trend.
>
> Last but not least, I don't think current discussion on Apache Calcite is in 
> the same situation as FRocksDB. Current Flink SQL guys complain that Calcite 
> is released too slowly, which blocks some feature development in Flink. 
> However, RocksDB community itself actually release new versions more 
> frequently, and we don't rely on its new version for some new features 
> currently. Moreover, we're often more careful on upgrading underlying storage 
> component as it could impact the performance and data correctness.
>
>
> [1] 
> https://github.com/ververica/frocksdb/commit/3da8249d50c8a3a6ea229f43890d37e098372786
> [2] https://github.com/facebook/rocksdb/issues/9390
>
> Best
> Yun Tang
>
> 
> From: Jing Zhang 
> Sent: Saturday, April 23, 2022 15:21
> To: dev 
> Cc: Yun Tang 
> Subject: Re: [DISCUSS] Maintain a Calcite repository for Flink to accelerate 
> the development for Flink SQL features
>
> Hi Godfrey,
> I would like to share some problems based on my past experience.
> 1.  It's not easy to push new features in the CALCITE community.
> As @Martijn referred, https://issues.apache.org/jira/browse/CALCITE-4865 /
> https://github.com/apache/calcite/pull/2606 is such an example.
> I tried out many ways, for example, sent review requests in the dev mail 
> list, left comments in JIRA and in pull requests.
> And had to give up finally. Sorry for that.
> 2. However,  some new features of calcite are radical.
> Such as https://issues.apache.org/jira/browse/CALCITE-4173, which had some 
> strong opposition in the CALCITE community,
> But it was merged finally and caused  unexpected problems, such as wrong 
> results (https://issues.apache.org/jira/browse/FLINK-24708)
> and other related bugs.
> 3. Every time we upgrade the calcite version, we will cross multiple 
> versions, resulting in a slow upgrade process and
> uncontrolled results, often causing some unexpected problems.
>
> Thank @Godfrey for driving this discussion in a big scope.
> I think it's a good chance to review these problems and find a solution.
>
> Best,
> Jing Zhang
>
> godfrey he mailto:godfre...@gmail.com>> 于2022年4月22日周五 
> 21:40写道:
> Hi Chesnay,
>
> There is no bug fix version until now.
> You can find the releases in https://github.com/apache/calcite/tags
>
> Best,
> Godfrey
>
> Chesnay Schepler mailto:ches...@apache.org>> 
> 于2022年4月22日周五 18:48写道:
> >
> > I find it a bit weird that the supposed only way to get a bug fix is to
> > do a big version upgrade.
> > Is Calcite not creating bugfix releases?
> >
> > On 22/04/2022 12:26, godfrey he wrote:
> > > Thanks for the feedback, guys!
> > >
> > > For Jingsong's feedback:
> > >> ## Do we have the plan to upgrade calcite to 1.31?
> > > I think we will upgrade Calcite to 1.31 only when Flink depends on
> > > some

Re: [DISCUSS] Maintain a Calcite repository for Flink to accelerate the development for Flink SQL features

2022-04-25 Thread godfrey he
Hi Jark,

Agree with you, thanks for the feedback.

Best,
Godfrey

Jark Wu  于2022年4月25日周一 13:02写道:
>
> Thanks, Godfrey, for starting this discussion,
>
> I understand the motivation behind it.
> No bugfix releases, slow feature reviewing, and no compatibility guaranteed
> are genuinely blocking the development of Flink SQL.
>
> I think a fork is the last choice before trying our best to cooperate with
> the Calcite community.
> But we shouldn't stop here if there is no progress. Therefore, I'm okay
> with maintaining a fork.
>
> However:
> 1) It should be a temporary solution. We should have a plan to move back to
> the latest Calcite version at some point (e.g., pushing them to resolve the
> problems mentioned above).
>
> 2) If we maintain the fork in flink-extended, we should determine a groupId
> for deploying to maven central. The community should have permission to
> deploy under the groupId.
>
> Best,
> Jark
>
>
> On Sun, 24 Apr 2022 at 16:14, godfrey he  wrote:
>
> > Hi, Jing
> > Thanks for sharing the Calcite experiences.
> > About Calcite version upgrading,  we should try not use the latest Calcite
> > version to avoid the bugs introduced by the new version if possible.
> > This may be a best practice.
> >
> >
> > Hi, Yun
> > Thanks for the detailed explanation for the experiences regarding FRocksDB.
> > I agree with you that the situation with Calcite and RocksDB is a
> > little difference.
> > The main pain point for Calcite is that we have to upgrade Calcite to
> > latest version
> > to get fix bugs and new features, but the latest version may be
> > unstable, which is a pain for us.
> > If we all agree we should maintain a forked Calcite repo,
> > there are many experiences we can learn from FRocksDB.
> >
> > Best,
> > Godfrey
> >
> > Yun Tang  于2022年4月24日周日 11:58写道:
> > >
> > > Hi all,
> > >
> > > I could share two cents here for how we maintain FRocksDB.
> > >
> > > First of all, we also do not prefer to maintain a customized RocksDB
> > version in Flink, which brings additional overhead for Flink community:
> > >
> > >
> > >   1.  RocksDB community switches to circleci for the CI tests after
> > RocksDB-6.x, which requires additional money to run all tests for reviewing
> > each PR.
> > >   2.  We need to compile and include all kinds of FRocksDB binaries on
> > linux32/64, windows, ppc64, ARM and Macos platforms, which is really tough
> > and boring experiences.
> > >
> > > The root reason why we have to maintain a forked RocksDB repo is that
> > RocksDB community refuses to accept a plugin-like feature based on
> > compaction filter, which is heavily dependent by Flink's state TTL feature
> > [1]. From RocksDB-7.0, the community also moves several components to the
> > plugin repo [2], although this cannot avoid us to release all kinds of
> > binaries, it can at least decrease our energy to maintain the whole tests
> > if we follow this trend.
> > >
> > > Last but not least, I don't think current discussion on Apache Calcite
> > is in the same situation as FRocksDB. Current Flink SQL guys complain that
> > Calcite is released too slowly, which blocks some feature development in
> > Flink. However, RocksDB community itself actually release new versions more
> > frequently, and we don't rely on its new version for some new features
> > currently. Moreover, we're often more careful on upgrading underlying
> > storage component as it could impact the performance and data correctness.
> > >
> > >
> > > [1]
> > https://github.com/ververica/frocksdb/commit/3da8249d50c8a3a6ea229f43890d37e098372786
> > > [2] https://github.com/facebook/rocksdb/issues/9390
> > >
> > > Best
> > > Yun Tang
> > >
> > > 
> > > From: Jing Zhang 
> > > Sent: Saturday, April 23, 2022 15:21
> > > To: dev 
> > > Cc: Yun Tang 
> > > Subject: Re: [DISCUSS] Maintain a Calcite repository for Flink to
> > accelerate the development for Flink SQL features
> > >
> > > Hi Godfrey,
> > > I would like to share some problems based on my past experience.
> > > 1.  It's not easy to push new features in the CALCITE community.
> > > As @Martijn referred, https://issues.apache.org/jira/browse/CALCITE-4865
> > /
> > > https://github.com/apache/calcite/pull/2606 is such an example.
> > > I tried out many ways, for example, sent review requ

Re: [DISCUSS] Planning Flink 1.16

2022-04-26 Thread godfrey he
Hi Konstantin & Chesnay,

Thanks for driving this discussion, I am willing to volunteer as the
release manager for 1.16.


Best,
Godfrey

Konstantin Knauf  于2022年4月26日周二 18:23写道:
>
> Hi everyone,
>
> With Flink 1.15 about to be released, the community has started planning &
> developing features for the next release, Flink 1.16. As such, I would like
> to start a discussion around managing this release.
>
> Specifically, Chesnay & myself would like to volunteer as release managers.
> Our focus as release managers would be
> * to propose a release timeline
> * to provide an overview of all ongoing development threads and ideally
> their current status to the community
> * to keep an eye on build stability
> * facilitate release testing
> * to do the actual release incl. communication (blog post, etc.)
>
> Is anyone else interested in acting as a release manager for Flink 1.16? If
> so, we are happy to make this a joint effort.
>
> Besides the question of who will act as a release manager, I think, we can
> already use this thread to align on a timeline. For collecting features and
> everything else, we would start a dedicated threads shortly.
>
> Given Flink 1.15 will be released in the next days, and aiming for a 4
> months release cycle including stabilization, this would mean *feature
> freeze at the end of July*. The exact date could be determined later. Any
> thoughts on the timeline.?
>
> Looking forward to your thoughts!
>
> Thanks,
>
> Chesnay & Konstantin


Re: [DISCUSS] Maintain a Calcite repository for Flink to accelerate the development for Flink SQL features

2022-05-05 Thread godfrey he
nk code base in the past and I'm
> > happy that we could prevent a fork until today. Let me elaborate a bit
> > on my strict opinion here:
> >
> > 1) Calcite does not offer bugfix releases
> >
> > In the end, also Calcite is an Apache community. I'm sure we could
> > improve our collaboration and help releasing bugfix releases. So far we
> > were mostly leveraging all the stuff that the Calcite community has
> > built. It would be good to strengthen the relation and also give
> > something back.
> >
> > So far having no bugfix releases was not really a problem for the Flink
> > community. We simply copy over files from Calcite into Flink once a bug
> > has been merged in Calcite. Maven implicitly overwrites the original
> > Calcite classes during artifact building. Most `org.apache.calcite`
> > classes in the Flink code base are fixing bugs and wait for removal
> > during the next Calcite upgrade.
> >
> > 2) Slow feature reviewing
> >
> > Slow feature reviewing has a good and a bad side. One of the reasons why
> > it is so slow is because the maintainers pay a lot of attention to
> > standard compliance, long-term code quality, and
> > cross-downstream-projects usability. All of that is the reason why the
> > Calcite code base has last multiple decades already and is useful for
> > many parties.
> >
> > Relying on Calcite has protected the Flink code base from merging
> > non-standard SQL features and extending the SQL dialect too much. The 1.
> > windows in Calcite and aux functions such as TUMBLE_START have shown
> > that only standard compliant features should be merged. Now the Flink
> > community has the problem of maintaining this custom syntax.
> >
> > 3) No compatibility guaranteed from the Calcite community
> >
> > I disagree here. Many changes are protected by keeping deprecated
> > methods/constructors/classes around for years. And many refactoring are
> > nice also for the Flink community. E.g. easier optimizer rule definition.
> >
> > IMHO the core problem is rather that we don't update Calcite frequently
> > enough. Currently, we are lagging behind quite a bit because we don't
> > pay enough resources in code maintenance but only in new feature
> > development. We should spend some time in a better balance of the two.
> >
> > Regards,
> > Timo
> >
> > Am 25.04.22 um 15:13 schrieb godfrey he:
> > > Hi Jark,
> > >
> > > Agree with you, thanks for the feedback.
> > >
> > > Best,
> > > Godfrey
> > >
> > > Jark Wu  于2022年4月25日周一 13:02写道:
> > >> Thanks, Godfrey, for starting this discussion,
> > >>
> > >> I understand the motivation behind it.
> > >> No bugfix releases, slow feature reviewing, and no compatibility
> > guaranteed
> > >> are genuinely blocking the development of Flink SQL.
> > >>
> > >> I think a fork is the last choice before trying our best to cooperate
> > with
> > >> the Calcite community.
> > >> But we shouldn't stop here if there is no progress. Therefore, I'm okay
> > >> with maintaining a fork.
> > >>
> > >> However:
> > >> 1) It should be a temporary solution. We should have a plan to move
> > back to
> > >> the latest Calcite version at some point (e.g., pushing them to resolve
> > the
> > >> problems mentioned above).
> > >>
> > >> 2) If we maintain the fork in flink-extended, we should determine a
> > groupId
> > >> for deploying to maven central. The community should have permission to
> > >> deploy under the groupId.
> > >>
> > >> Best,
> > >> Jark
> > >>
> > >>
> > >> On Sun, 24 Apr 2022 at 16:14, godfrey he  wrote:
> > >>
> > >>> Hi, Jing
> > >>> Thanks for sharing the Calcite experiences.
> > >>> About Calcite version upgrading,  we should try not use the latest
> > Calcite
> > >>> version to avoid the bugs introduced by the new version if possible.
> > >>> This may be a best practice.
> > >>>
> > >>>
> > >>> Hi, Yun
> > >>> Thanks for the detailed explanation for the experiences regarding
> > FRocksDB.
> > >>> I agree with you that the situation with Calcite and RocksDB is a
> > >>> little difference.
> > >>> The main pain point for Calcite is that we have to upg

Re: [ANNOUNCE] Apache Flink 1.15.0 released

2022-05-05 Thread godfrey he
Congratulations~

Thanks Yun, Till and Joe for driving this release
and everyone who made this release happen.

Best,
Godfrey

Becket Qin  于2022年5月5日周四 17:39写道:
>
> Hooray! Thanks Yun, Till and Joe for driving the release!
>
> Cheers,
>
> JIangjie (Becket) Qin
>
> On Thu, May 5, 2022 at 5:20 PM Timo Walther  wrote:
>
> > It took a bit longer than usual. But I'm sure the users will love this
> > release.
> >
> > Big thanks to the release managers!
> >
> > Timo
> >
> > Am 05.05.22 um 10:45 schrieb Yuan Mei:
> > > Great!
> > >
> > > Thanks, Yun Gao, Till, and Joe for driving the release, and thanks to
> > > everyone for making this release happen!
> > >
> > > Best
> > > Yuan
> > >
> > > On Thu, May 5, 2022 at 4:40 PM Leonard Xu  wrote:
> > >
> > >> Congratulations!
> > >>
> > >> Thanks Yun Gao, Till and Joe for the great work as our release manager
> > and
> > >> everyone who involved.
> > >>
> > >> Best,
> > >> Leonard
> > >>
> > >>
> > >>
> > >>> 2022年5月5日 下午4:30,Yang Wang  写道:
> > >>>
> > >>> Congratulations!
> > >>>
> > >>> Thanks Yun Gao, Till and Joe for driving this release and everyone who
> > >> made
> > >>> this release happen.
> > >>
> >
> >


Re: [DISCUSS] FLIP-91: Support SQL Client Gateway

2022-05-05 Thread godfrey he
Hi Shengkai.

Thanks for driving the proposal, it's been silent too long.

I have a few questions:
about the Architecture
> The architecture of the Gateway is in the following graph.
Is the TableEnvironment shared for all sessions ?

about the REST Endpoint
> /v1/sessions
Are both local file and remote file supported for `libs` and `jars`?
Does sql gateway support upload file?

>/v1/sessions/:session_handle/configure_session
Can this api be replaced with `/v1/sessions/:session_handle/statements` ?

>/v1/sessions/:session_id/operations/:operation_handle/status
`:session_id` is a typo, it should be `:session_handdle`

>/v1/sessions/:session_handle/statements
>The statement must be a single command
Does this api support `begin statement set ... end` or `statement set
begin ... end`
 DO `ADD JAR`, `REMOVE JAR` support ? If yes, how to manage the jars?

>/v1/sessions/:session_handle/operations/:operation_handle/result/:token
>"type": # string value of LogicalType
 Some LogicalTypes can not be serialized, such as: CharType(0)

about Options
> endpoint.protocol
I think REST is not a kind of protocol[1], but is an architectural style.
The value should be `HTTP`.

about SQLGatewayService API
>  Catalog API
> ...
I think we should avoid providing such api, because once catalog api
is changed or added,
This class should also be changed. SQL statement is a more general interface.

> Options
> sql-gateway.session.idle.timeout
>sql-gateway.session.check.interval
>sql-gateway.worker.keepalive.time
It's better we can keep the option style as Flink, the level should
not be too deep.
sql-gateway.session.idle.timeout -> sql-gateway.session.idle-timeout
sql-gateway.session.check.interval -> sql-gateway.session.check-interval
sql-gateway.worker.keepalive.time -> sql-gateway.worker.keepalive->time

[1] https://restfulapi.net/

Best,
Godfrey

Nicholas Jiang  于2022年5月5日周四 14:58写道:
>
> Hi Shengkai,
>
> I have another concern about the submission of batch job. Does the Flink SQL 
> gateway support to submit batch job? In Kyuubi, BatchProcessBuilder is used 
> to submit batch job. What about the Flink SQL gateway?
>
> Best regards,
> Nicholas Jiang
>
> On 2022/04/24 03:28:36 Shengkai Fang wrote:
> > Hi. Jiang.
> >
> > Thanks for your feedback!
> >
> > > Do the public interfaces of GatewayService refer to any service?
> >
> > We will only expose one GatewayService implementation. We will put the
> > interface into the common package and the developer who wants to implement
> > a new endpoint can just rely on the interface package rather than the
> > implementation.
> >
> > > What's the behavior of SQL Client Gateway working on Yarn or K8S? Does
> > the SQL Client Gateway support application or session mode on Yarn?
> >
> > I think we can support SQL Client Gateway to submit the jobs in
> > application/sesison mode.
> >
> > > Is there any event trigger in the operation state machine?
> >
> > Yes. I have already updated the content and add more details about the
> > state machine. During the revise, I found that I mix up the two concepts:
> > job submission and job execution. In fact, we only control the submission
> > mode at the gateway layer. Therefore, we don't need to mapping the
> > JobStatus here. If the user expects that the synchronization behavior is to
> > wait for the completion of the job execution before allowing the next
> > statement to be executed, then the Operation lifecycle should also contains
> > the job's execution, which means users should set `table.dml-sync`.
> >
> > > What's the return schema for the public interfaces of GatewayService?
> > Like getTable interface, what's the return value schema?
> >
> > The API of the GatewayService return the java objects and the endpoint can
> > organize the objects with expected schema. The return results is also list
> > the section ComponetAPI#GatewayService#API. The return type of the
> > GatewayService#getTable is `ContextResolvedTable`.
> >
> > > How does the user get the operation log?
> >
> > The OperationManager will register the LogAppender before the Operation
> > execution. The Log Appender will hijack the logger and also write the log
> > that related to the Operation to another files. When users wants to fetch
> > the Operation log, the GatewayService will read the content in the file and
> > return.
> >
> > Best,
> > Shengkai
> >
> >
> >
> >
> > Nicholas Jiang  于2022年4月22日周五 16:21写道:
> >
> > > Hi Shengkai.
> > >
> > > Thanks for driving the proposal of SQL Client Gateway. I have some
> > > knowledge of Kyuubi and have some questions about the design:
> > >
> > > 1.Do the public interfaces of GatewayService refer to any service? If
> > > referring to HiveService, does GatewayService need interfaces like
> > > getQueryId etc.
> > >
> > > 2.What's the behavior of SQL Client Gateway working on Yarn or K8S? Does
> > > the SQL Client Gateway support application or session mode on Yarn?
> > >
> > > 3.Is there any event trigger in the operation state machine?
> > >

Re: [ANNOUNCE] New Flink PMC member: Yang Wang

2022-05-05 Thread godfrey he
Congratulations, Yang!

Best,
Godfrey

Yangze Guo  于2022年5月6日周五 10:17写道:
>
> Congratulations Yang!
>
> Best,
> Yangze Guo
>
> On Fri, May 6, 2022 at 10:11 AM Forward Xu  wrote:
> >
> > Congratulations, Yang!
> >
> >
> > Best,
> >
> > Forward
> >
> > Jingsong Li  于2022年5月6日周五 10:07写道:
> >
> > > Congratulations, Yang!
> > >
> > > Best,
> > > Jingsong
> > >
> > > On Fri, May 6, 2022 at 10:04 AM yuxia  wrote:
> > > >
> > > > Congratulations, Yang!
> > > >
> > > > Best regards,
> > > > Yuxia
> > > >
> > > > - 原始邮件 -
> > > > 发件人: "Zhu Zhu" 
> > > > 收件人: "dev" 
> > > > 抄送: "Yang Wang" 
> > > > 发送时间: 星期四, 2022年 5 月 05日 下午 10:36:19
> > > > 主题: Re: [ANNOUNCE] New Flink PMC member: Yang Wang
> > > >
> > > > Congratulations, Yang!
> > > >
> > > > Thanks,
> > > > Zhu
> > > >
> > > > Weiqing Yang  于2022年5月5日周四 22:28写道:
> > > > >
> > > > > Congratulations Yang!
> > > > >
> > > > > Best,
> > > > > Weiqing
> > > > >
> > > > > On Thu, May 5, 2022 at 4:18 AM Xintong Song 
> > > wrote:
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > I'm very happy to announce that Yang Wang has joined the Flink PMC!
> > > > > >
> > > > > > Yang has been consistently contributing to our community, by
> > > contributing
> > > > > > codes, participating in discussions, mentoring new contributors,
> > > answering
> > > > > > questions on mailing lists, and giving talks on Flink at
> > > > > > various conferences and events. He is one of the main contributors
> > > and
> > > > > > maintainers in Flink's Native Kubernetes / Yarn integrations and the
> > > Flink
> > > > > > Kubernetes Operator.
> > > > > >
> > > > > > Congratulations and welcome, Yang!
> > > > > >
> > > > > > Thank you~
> > > > > >
> > > > > > Xintong Song (On behalf of the Apache Flink PMC)
> > > > > >
> > >


Re: [DISCUSS] FLIP-91: Support SQL Client Gateway

2022-05-06 Thread godfrey he
ink we can't. I don't understand the meaning of all the problems. We
> > can use the REST API to expose all the functionalities in the Gateway side.
> > But many users may have their tools to communicate to the Gateway, which
> > may be based on the HiveServer2 API(thrift api).
> >
> > Best,
> > Shengkai
> >
> >
> >
> >
> >
> >
> >
> > Jingsong Li  于2022年5月6日周五 09:16写道:
> >
> > > Thanks Shengkai for driving.  And all for your discussion.
> > >
> > > > The reason why we introduce the gateway with pluggable endpoints is
> > that
> > > many users has their preferences. For example, the HiveServer2 users
> > prefer
> > > to use the gateway with HiveServer2-style API, which has numerous tools.
> > > However, some filnk-native users may prefer to use the REST API.
> > Therefore,
> > > we hope to learn from the Kyuubi's design that expose multiple endpoints
> > > with different API that allow the user to use.
> > >
> > > My understanding is that we need multiple endpoints, But I don't quite
> > > understand why we need both the rest api and the SQLGatewayService
> > > API, maybe I'm missing something, what's the difference between them?
> > > Is it possible to use one set of rest api to solve all the problems?
> > >
> > > > Gateway to support multiple Flink versions
> > >
> > > I think this is a good question to consider.
> > > - First of all, I think it is absolutely impossible for gateway to
> > > support multiple versions of Flink under the current architecture,
> > > because gateway relies on Flink SQL and a lot of SQL compiled and
> > > optimized code is bound to the Flink version.
> > > - The other way is that gateway does not rely on Flink SQL, and each
> > > time a different version of Flink Jar is loaded to compile the job at
> > > once, and frankly speaking, stream jobs actually prefer this model.
> > >
> > > The benefit of gateway support for multiple versions is that it's
> > > really more user-friendly. I've seen cases where users must have
> > > multiple versions existing in a cluster, and if each version needs to
> > > run a gateway, the O&M burden will be heavy.
> > >
> > > > I don't think that the Gateway is a 'core' function of Flink which
> > > should be included with Flink.
> > >
> > > First, I think the Gateway is a 'core' function in Flink.
> > > Why?
> > > I think our architecture should be consistent, which means that Flink
> > > sql-client should use the implementation of gateway, which means that
> > > sql-client depends on gateway.
> > > And sql-client is the basic tool of flink sql, it must exist in flink
> > > repository, otherwise flink sql has no most important entrance.
> > > So, the gateway itself should be our core functionality as well.
> > >
> > > Best,
> > > Jingsong
> > >
> > > On Thu, May 5, 2022 at 10:06 PM Jark Wu  wrote:
> > > >
> > > > Hi Martijn,
> > > >
> > > > Regarding maintaining Gateway inside or outside Flink code base,
> > > > I would like to share my thoughts:
> > > >
> > > > > I would like to understand why it's complicated to make the upgrades
> > > > problematic. Is it because of relying on internal interfaces? If so,
> > > should
> > > > we not consider making them public?
> > > >
> > > > It's not about internal interfaces. Flink itself doesn't provide
> > backward
> > > > compatibility for public APIs.
> > > >
> > > >
> > > > > a) it will not be possible to have separate releases of the Gateway,
> > > > they will be tied to individual Flink releases
> > > > I don't think it's a problem. On the contrary, maintaining a separate
> > > repo
> > > > for Gateway will take a lot of
> > > > extra community efforts, e.g., individual CICD, docs, releases.
> > > >
> > > >
> > > > > b) if you want the Gateway to support multiple Flink versions
> > > > Sorry, I don't see any users requesting this feature for such a long
> > time
> > > > for SQL Gateway.
> > > > Users can build services on Gateway to easily support multi Flink
> > > versions
> > > > (a Gateway for a Flink version).
> > > 

Re: Re: 【Could we support distribute by For FlinkSql】

2022-05-10 Thread godfrey he
Hi, Ipengdream. I will drive this work.
We will support this functionality via hints,
because "distribute by" is not in the sql standard.
But it will be supported in hive dialect.
I will post the FLIP doc recently.

Best,
Godfrey


Jark Wu  于2022年5月9日周一 16:03写道:

>
> We will start a FLIP discussion in the dev mailing list, so please watch on
> the ML.
> I also find that you opened FLINK-27541, we will also update FLINK-27541
> once we have an initial FLIP.
>
> Best,
> Jark
>
> On Mon, 9 May 2022 at 15:18, lpengdr...@163.com  wrote:
>
> > Yeah!  That's great. Thank you!   Where can i get more information about
> > that?
> >
> >
> >
> > lpengdr...@163.com
> >
> > 发件人: Jark Wu
> > 发送时间: 2022-05-09 14:12
> > 收件人: dev
> > 抄送: 贺小令
> > 主题: Re: Re: 【Could we support distribute by For FlinkSql】
> > I got what you want, maybe something like DISTRIBUTED BY in Hive SQL.
> > The community is planning to support this feature but has not started yet.
> > @Godfrey will drive this work.
> >
> > Best,
> > Jark
> >
> > On Mon, 9 May 2022 at 13:45, lpengdr...@163.com 
> > wrote:
> >
> > > Hi
> > > Thanks for your reply.
> > > The way I want is not only for hash-lookup-join,   there are manay
> > > operators  need  a hash-operation to solve the skew-problem.  Lookup-join
> > > is a special scene.
> > > So I hope there is a operator could make a shuffle. Maybe it's a way
> > > to solve the problems ?
> > >
> > >
> > >
> > https://docs.google.com/document/d/1D7AX-_wttMNY53TxLQxiDaRyDVCeEZYCE8AwYflDXZM/edit?usp=sharing
> > >
> > >
> > >
> > >
> > >
> > > lpengdr...@163.com
> > >
> > > 发件人: Jark Wu
> > > 发送时间: 2022-05-09 12:27
> > > 收件人: dev
> > > 主题: Re: 【Could we support distribute by For FlinkSql】
> > > Hi,
> > >
> > > If you are looking for the hash lookup join, there is an in-progress
> > > FLIP-204[1] working for it.
> > >
> > > Btw, I still can't see your picture. You can upload your picture to some
> > > image service and share a link here.
> > >
> > > Best,
> > > Jark
> > >
> > > [1]:
> > >
> > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-204%3A+Introduce+Hash+Lookup+Join
> > >
> > > On Mon, 9 May 2022 at 11:22, lpengdr...@163.com 
> > > wrote:
> > >
> > > > Sorry!
> > > > The destroied picture is the attachment ;
> > > >
> > > > --
> > > > lpengdr...@163.com
> > > >
> > > >
> > > > *发件人:* lpengdr...@163.com
> > > > *发送时间:* 2022-05-09 11:16
> > > > *收件人:* user-zh ; dev 
> > > > *主题:* 【Could we support distribute by For FlinkSql】
> > > > Hello:
> > > > Now we cann't add a shuffle-operation in a sql-job.
> > > > Sometimes , for example, I have a kafka-source(three partitions) with
> > > > parallelism three. And then I have a lookup-join function, I want
> > process
> > > > the data distribute by id so that the data can split into thre
> > > parallelism
> > > > evenly (The source maybe slant seriously).
> > > > In DataStream API i can do it with keyby(), but it's so sad that i can
> > do
> > > > nothing when i use a sql;
> > > > Maybe we can do it like 'select id, f1,f2 from sourceTable distribute
> > by
> > > > id' like we do it in SparkSql.
> > > >
> > > > Sot that we can make change on the picture  in sql-mode;
> > > >
> > > >
> > > >
> > > > --
> > > > lpengdr...@163.com
> > > >
> > > >
> > >
> >


[DISCUSS] FLIP-231: Introduce SupportStatisticReport to support reporting statistics from source connectors

2022-05-13 Thread godfrey he
Hi all,

I would like to open a discussion on FLIP-231:  Introduce SupportStatisticReport
to support reporting statistics from source connectors.

Statistics are one of the most important inputs to the optimizer.
Accurate and complete statistics allows the optimizer to be more powerful.
Currently, the statistics of Flink SQL come from Catalog only,
while many Connectors have the ability to provide statistics, e.g. FileSystem.
In production, we find many tables in Catalog do not have any statistics.
As a result, the optimizer can't generate better execution plans,
especially for Batch jobs.

There are two approaches to enhance statistics for the planner,
one is to introduce the "ANALYZE TABLE" syntax which will write
the analyzed result to the catalog, another is to introduce a new
connector interface
which allows the connector itself to report statistics directly to the planner.
The second one is a supplement to the catalog statistics.

Here, we will discuss the second approach. Compared to the first one,
the second one is to get statistics in real time, no need to run an
analysis job for each table. This could help improve the user
experience.
(We will also introduce the "ANALYZE TABLE" syntax in other FLIP.)

You can find more details in FLIP-231 document[1]. Looking forward to
your feedback.

[1] 
https://cwiki.apache.org/confluence/pages/resumedraft.action?draftId=211883860&draftShareId=eda17eaa-43f9-4dc1-9a7d-3a9b5a4bae00&;
[2] POC: https://github.com/godfreyhe/flink/tree/FLIP-231


Best,
Godfrey


Re: [DISCUSS] FLIP-231: Introduce SupportStatisticReport to support reporting statistics from source connectors

2022-05-16 Thread godfrey he
Hi Jingsong,

Thanks for the feedback.


>One concern I have is that we read the footer for each file, and this may
>be a bit costly in some cases. Is it possible for us to have some
> hierarchical way
yes, if there are thousands of orc/parquet files, it may take a long time.
So we can introduce a config option to let the user choose the
granularity of the statistics.
But the SIZE will not be introduced, because the planner does not use
the file size statistics now.
We can introduce once file size statistics is introduce in the future.
I think we should also introduce a config option to enable/disable
SupportStatisticReport,
because it's a heavy operation for some connectors in some cases.

> is the filter pushdown already happening at
> this time?
That's a good point. Currently, the filter push down is after partition pruning
to prevent the filter push down rule from consuming the partition predicates.
The statistics will be set to unknown if filter is pushed down now.
To combine them all, we can create an optimization program after filter push
down program to collect the statistics. This could avoid collecting
statistics multiple times.


Best,
Godfrey

Jingsong Li  于2022年5月13日周五 22:44写道:
>
> Thank Godfrey for driving.
>
> Looks very good~ This will undoubtedly greatly enhance the various batch
> mode connectors.
>
> I left some comments:
>
> ## FileBasedStatisticsReportableDecodingFormat
>
> One concern I have is that we read the footer for each file, and this may
> be a bit costly in some cases. Is it possible for us to have some
> hierarchical way, e.g.
> - No statistics are collected for files by default.
> - SIZE: Generate statistics based on file Size, get the size of the file
> only with access to the master of the FileSystem.
> - DETAILED: Get the complete statistics by format, possibly by accessing
> the footer of the file.
>
> ## When use the statistics reported by connector
>
> > When partitions are pruned by PushPartitionIntoTableSourceScanRule, the
> statistics should also be updated.
>
> I understand that we definitely need to use reporter after the partition
> prune, but another question: is the filter pushdown already happening at
> this time?
> Can we make sure that in the following three cases, both the filter
> pushdown and the partition prune happen before the stats reporting.
> - only partition prune happens
> - only filter pushdown happens
> - both filter pushdown and partition prune happen
>
> Best,
> Jingsong
>
> On Fri, May 13, 2022 at 6:57 PM godfrey he  wrote:
>
> > Hi all,
> >
> > I would like to open a discussion on FLIP-231:  Introduce
> > SupportStatisticReport
> > to support reporting statistics from source connectors.
> >
> > Statistics are one of the most important inputs to the optimizer.
> > Accurate and complete statistics allows the optimizer to be more powerful.
> > Currently, the statistics of Flink SQL come from Catalog only,
> > while many Connectors have the ability to provide statistics, e.g.
> > FileSystem.
> > In production, we find many tables in Catalog do not have any statistics.
> > As a result, the optimizer can't generate better execution plans,
> > especially for Batch jobs.
> >
> > There are two approaches to enhance statistics for the planner,
> > one is to introduce the "ANALYZE TABLE" syntax which will write
> > the analyzed result to the catalog, another is to introduce a new
> > connector interface
> > which allows the connector itself to report statistics directly to the
> > planner.
> > The second one is a supplement to the catalog statistics.
> >
> > Here, we will discuss the second approach. Compared to the first one,
> > the second one is to get statistics in real time, no need to run an
> > analysis job for each table. This could help improve the user
> > experience.
> > (We will also introduce the "ANALYZE TABLE" syntax in other FLIP.)
> >
> > You can find more details in FLIP-231 document[1]. Looking forward to
> > your feedback.
> >
> > [1]
> > https://cwiki.apache.org/confluence/pages/resumedraft.action?draftId=211883860&draftShareId=eda17eaa-43f9-4dc1-9a7d-3a9b5a4bae00&;
> > [2] POC: https://github.com/godfreyhe/flink/tree/FLIP-231
> >
> >
> > Best,
> > Godfrey
> >


Re: [DISCUSS] FLIP-222: Support full query lifecycle statements in SQL client

2022-05-17 Thread godfrey he
Hi Paul,

Thanks for driving this, LGTM overall.

I have a few minor comments:

>SHOW QUERIES
I want to clear the scope the command, does the command show the
queries submitted
via SqlClient, or all queries in current cluster (submitted via other CLI)?
History queries are included? What's the behavior for per-job cluster?

The result should contain 'finish_time' field, which is more friendly
for batch job.

>DROP QUERY ''
What's the behavior for batch jobs and the non-running jobs?

>SAVEPOINT ''
+1 to align with the SQL standard.
What's the behavior for batch jobs?

SHOW SAVEPOINTS is missing.

* Table API
+1 to introduce the API in Table API

Best,
Godfrey

Paul Lam  于2022年5月11日周三 19:20写道:
>
> Hi Jark,
>
> Thanks a lot for your opinions and suggestions! Please see my replies inline.
>
> > 1) the display of savepoint_path
>
>
> Agreed. Adding it to the FLIP.
>
> > 2) Please make a decision on multiple options in the FLIP.
>
> Okay. I’ll keep one and move the other to the rejected alternatives section.
>
> > 4) +1 SHOW QUERIES
> > Btw, the displayed column "address" is a little confusing to me.
> > At the first glance, I'm not sure what address it is, JM RPC address? JM 
> > REST address? Gateway address?
> > If this is a link to the job's web UI URL, how about calling it "web_url" 
> > and display in
> > "http://:" format?
> > Besides, how about displaying "startTime" or "uptime" as well?
>
> I’m good with these changes. Updating the FLIP according to your suggestions.
>
> > 5) STOP/CANCEL QUERY vs DROP QUERY
> > I'm +1 to DROP, because it's more compliant with SQL standard naming, i.e., 
> > "SHOW/CREATE/DROP".
> > Separating STOP and CANCEL confuses users a lot what are the differences 
> > between them.
> > I'm +1 to add the "PURGE" keyword to the DROP QUERY statement, which 
> > indicates to stop query without savepoint.
> > Note that, PURGE doesn't mean stop with --drain flag. The drain flag will 
> > flush all the registered timers
> > and windows which could lead to incorrect results when the job is resumed. 
> > I think the drain flag is rarely used
> > (please correct me if I'm wrong), therefore, I suggest moving this feature 
> > into future work when the needs are clear.
>
> I’m +1 to represent ungrateful cancel by PURGE. I think —drain flag is not 
> used very often as you said, and we
> could just add a table config option to enable that flag.
>
> > 7)  and  should be quoted
> > All the  and  should be string literal, otherwise 
> > it's hard to parse them.
> > For example, STOP QUERY '’.
>
> Good point! Adding it to the FLIP.
>
> > 8) Examples
> > Could you add an example that consists of all the statements to show how to 
> > manage the full lifecycle of queries?
> > Including show queries, create savepoint, remove savepoint, stop query with 
> > a savepoint, and restart query with savepoint.
>
> Agreed. Adding it to the FLIP as well.
>
> Best,
> Paul Lam
>
> > 2022年5月7日 18:22,Jark Wu  写道:
> >
> > Hi Paul,
> >
> > I think this FLIP has already in a good shape. I just left some additional 
> > thoughts:
> >
> > 1) the display of savepoint_path
> > Could the displayed savepoint_path include the scheme part?
> > E.g. `hdfs:///flink-savepoints/savepoint-cca7bc-bb1e257f0dab`
> > IIUC, the scheme part is omitted when it's a local filesystem.
> > But the behavior would be clearer if including the scheme part in the 
> > design doc.
> >
> > 2) Please make a decision on multiple options in the FLIP.
> > It might give the impression that we will support all the options.
> >
> > 3) +1 SAVEPOINT and RELEASE SAVEPOINT
> > Personally, I also prefer "SAVEPOINT " and "RELEASE SAVEPOINT 
> > "
> > to "CREATE/DROP SAVEPOINT", as they have been used in mature databases.
> >
> > 4) +1 SHOW QUERIES
> > Btw, the displayed column "address" is a little confusing to me.
> > At the first glance, I'm not sure what address it is, JM RPC address? JM 
> > REST address? Gateway address?
> > If this is a link to the job's web UI URL, how about calling it "web_url" 
> > and display in
> > "http://:" format?
> > Besides, how about displaying "startTime" or "uptime" as well?
> >
> > 5) STOP/CANCEL QUERY vs DROP QUERY
> > I'm +1 to DROP, because it's more compliant with SQL standard naming, i.e., 
> > "SHOW/CREATE/DROP".
> > Separating STOP and CANCEL confuses users a lot what are the differences 
> > between them.
> > I'm +1 to add the "PURGE" keyword to the DROP QUERY statement, which 
> > indicates to stop query without savepoint.
> > Note that, PURGE doesn't mean stop with --drain flag. The drain flag will 
> > flush all the registered timers
> > and windows which could lead to incorrect results when the job is resumed. 
> > I think the drain flag is rarely used
> > (please correct me if I'm wrong), therefore, I suggest moving this feature 
> > into future work when the needs are clear.
> >
> > 6) Table API
> > I think it makes sense to support the new statements in Table API.
> > We should try to make the Gateway and CLI

Re: Re: [DISCUSS] FLIP-218: Support SELECT clause in CREATE TABLE(CTAS)

2022-05-17 Thread godfrey he
Hi Mang,

Thanks for driving this FLIP.

Please follow the FLIP template[1] style, and the `Syntax ` is part of
the `Public API Changes` section.
‘Program research’ and 'Implementation Plan' are part of the `Proposed
Changes` section,
or move ‘Program research’ to the appendix.

> Providing methods that are used to execute CTAS for Table API users.
We should introduce `createTable` in `Table` instead of `TableEnvironment`.
Because all table operations are defined in `Table`, see: Table#executeInsert,
Table#insertInto, etc.
About the method name, I prefer to use `createTableAs`.

> TableSink needs to provide the CleanUp API, developers implement as needed.
I think it's hard for TableSink to implement a clean up operation. For
file system sink,
the data can be written to a temporary directory, but for key/value
sinks, it's hard to
remove the written keys, unless the sink records all written keys.

> Do not do drop table operations in the framework, drop table is implemented in
TableSink according to the needs of specific TableSink
The TM process may crash at any time, and the drop operation will not
be executed any more.

How about we do the drop table operation and cleanup data action in the catalog?
Where to execute the drop operation. one approach is in client, other is in JM.
1. in client: this requires the client to be alive until the job is
finished and failed.
2. in JM: this requires the JM could provide some interfaces/hooks
that the planner
implements the logic and the code will be executed in JM.
I prefer the approach two, but it requires more detail design with
runtime @gaoyunhaii, @kevin.yingjie


[1] https://cwiki.apache.org/confluence/display/FLINK/FLIP+Template

Best,
Godfrey


Mang Zhang  于2022年5月6日周五 11:24写道:

>
> Hi, Yuxia
> Thanks for your reply!
> About the question 1, we will not support, FLIP-218[1] is to simplify the 
> complexity of user DDL and make it easier for users to use. I have never 
> encountered this case in a big data.
> About the question 2, we will provide a public API like below public void 
> cleanUp();
>
>   Regarding the mechanism of cleanUp, people who are familiar with the 
> runtime module need to provide professional advice, which is what we need to 
> focus on.
>
>
>
>
>
>
>
>
>
>
> --
>
> Best regards,
> Mang Zhang
>
>
>
>
>
> At 2022-04-29 17:00:03, "yuxia"  wrote:
> >Thanks for for driving this work, it's to be a useful feature.
> >About the flip-218, I have some questions.
> >
> >1: Does our CTAS syntax support specify target table's schema including 
> >column name and data type? I think it maybe a useful fature in case we want 
> >to change the data types in target table instead of always copy the source 
> >table's schema. It'll be more flexible with this feature.
> >Btw, MySQL's "CREATE TABLE ... SELECT Statement"[1] support this feature.
> >
> >2: Seems it'll requre sink to implement an public interface to drop table, 
> >so what's the interface will look like?
> >
> >[1] https://dev.mysql.com/doc/refman/8.0/en/create-table-select.html
> >
> >Best regards,
> >Yuxia
> >
> >- 原始邮件 -
> >发件人: "Mang Zhang" 
> >收件人: "dev" 
> >发送时间: 星期四, 2022年 4 月 28日 下午 4:57:24
> >主题: [DISCUSS] FLIP-218: Support SELECT clause in CREATE TABLE(CTAS)
> >
> >Hi, everyone
> >
> >
> >I would like to open a discussion for support select clause in CREATE 
> >TABLE(CTAS),
> >With the development of business and the enhancement of flink sql 
> >capabilities, queries become more and more complex.
> >Now the user needs to use the Create Table statement to create the target 
> >table first, and then execute the insert statement.
> >However, the target table may have many columns, which will bring a lot of 
> >work outside the business logic to the user.
> >At the same time, ensure that the schema of the created target table is 
> >consistent with the schema of the query result.
> >Using a CTAS syntax like Hive/Spark can greatly facilitate the user.
> >
> >
> >
> >You can find more details in FLIP-218[1]. Looking forward to your feedback.
> >
> >
> >
> >[1] 
> >https://cwiki.apache.org/confluence/display/FLINK/FLIP-218%3A+Support+SELECT+clause+in+CREATE+TABLE(CTAS)
> >
> >
> >
> >
> >--
> >
> >Best regards,
> >Mang Zhang
>
>


Re: [VOTE] FLIP-229: Introduces Join Hint for Flink SQL Batch Job

2022-05-17 Thread godfrey he
Thanks Xuyang for driving this, +1(binding)

Best,
Godfrey

Xuyang  于2022年5月17日周二 10:21写道:
>
> Hi, everyone.
> Thanks for your feedback for FLIP-229: Introduces Join Hint for Flink SQL 
> Batch Job[1] on the discussion thread[2].
> I'd like to start a vote for it. The vote will be open for at least 72 hours 
> unless there is an objection or not enough votes.
>
> --
>
> Best!
> Xuyang
>
>
> [1] 
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-229%3A+Introduces+Join+Hint+for+Flink+SQL+Batch+Job
> [2] https://lists.apache.org/thread/y668bxyjz66ggtjypfz9t571m0tyvv9h


Re: [VOTE] FLIP-91: Support SQL Gateway

2022-05-22 Thread godfrey he
+1

Best,
Godfrey

LuNing Wang  于2022年5月23日周一 13:06写道:
>
> +1 (non-binding)
>
> Best,
> LuNing Wang
>
> Nicholas Jiang  于2022年5月23日周一 12:57写道:
>
> > +1 (non-binding)
> >
> > Best,
> > Nicholas Jiang
> >
> > On 2022/05/20 02:38:39 Shengkai Fang wrote:
> > > Hi, everyone.
> > >
> > > Thanks for your feedback for FLIP-91: Support SQL Gateway[1] on the
> > > discussion thread[2]. I'd like to start a vote for it. The vote will be
> > > open for at least 72 hours unless there is an objection or not enough
> > votes.
> > >
> > > Best,
> > > Shengkai
> > >
> > > [1]
> > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Gateway
> > > [2]https://lists.apache.org/thread/gr7soo29z884r1scnz77r2hwr2xmd9b0
> > >
> >


Re: [DISCUSS] FLIP-231: Introduce SupportStatisticReport to support reporting statistics from source connectors

2022-05-23 Thread godfrey he
Hi, Jark

Thanks for the feedback.

> 1) All the ability interfaces begin with "Supports" instead of "Support".
+1

> The "connect" word should be "collect"?
Yes, it's a typo.

> CatalogStatistics
Yes, we should use TableStats.
I forgot that TableStats and ColumnStats have been ported to the API module.

> What's the difference between them?
table.optimizer.source.collect-statistics-enabled is used for all collectors,
while source.statistics-type is file base connectors.
It may take a long time to get the detailed statistics,
but may be the file size (will be introduced later) is enough.

> IMO, we should also support Hive source as well in this FLIP.
+1

Best,
Godfrey

Jark Wu  于2022年5月20日周五 12:04写道:
>
> Hi Godfrey,
>
> I just left some comments here:
>
> 1) SupportStatisticReport => SupportsStatisticReport
> All the ability interfaces begin with "Supports" instead of "Support".
>
> 2) table.optimizer.source.connect-statistics-enabled
> The "connect" word should be "collect"?
>
> 3) CatalogStatistics
> I was a little confused when I first saw the name. I thought it reports
> stats for a catalog...
> Why not use "TableStats" which already wraps "ColumnStats" in it and is a
> public API as well?
>
> 4) source.statistics-type
> vs table.optimizer.source.collect-statistics-enabled
> What's the difference between them? It seems that they are both used to
> enable or disable reporting stats.
>
> 5) "Which connectors and formats will be supported by default?"
> IMO, we should also support Hive source as well in this FLIP.
> Hive source is more widely used than Filesystem connector.
>
> Best,
> Jark
>
>
>
>
> On Tue, 17 May 2022 at 10:52, Jingsong Li  wrote:
>
> > Hi Godfrey,
> >
> > Thanks for your reply.
> >
> > Sounds good to me.
> >
> > > I think we should also introduce a config option
> >
> > We can add this option to the FLIP. I prefer a option for
> > FileSystemConnector, maybe a enum.
> >
> > Best,
> > Jingsong
> >
> > On Tue, May 17, 2022 at 10:31 AM godfrey he  wrote:
> >
> > > Hi Jingsong,
> > >
> > > Thanks for the feedback.
> > >
> > >
> > > >One concern I have is that we read the footer for each file, and this
> > may
> > > >be a bit costly in some cases. Is it possible for us to have some
> > > > hierarchical way
> > > yes, if there are thousands of orc/parquet files, it may take a long
> > time.
> > > So we can introduce a config option to let the user choose the
> > > granularity of the statistics.
> > > But the SIZE will not be introduced, because the planner does not use
> > > the file size statistics now.
> > > We can introduce once file size statistics is introduce in the future.
> > > I think we should also introduce a config option to enable/disable
> > > SupportStatisticReport,
> > > because it's a heavy operation for some connectors in some cases.
> > >
> > > > is the filter pushdown already happening at
> > > > this time?
> > > That's a good point. Currently, the filter push down is after partition
> > > pruning
> > > to prevent the filter push down rule from consuming the partition
> > > predicates.
> > > The statistics will be set to unknown if filter is pushed down now.
> > > To combine them all, we can create an optimization program after filter
> > > push
> > > down program to collect the statistics. This could avoid collecting
> > > statistics multiple times.
> > >
> > >
> > > Best,
> > > Godfrey
> > >
> > > Jingsong Li  于2022年5月13日周五 22:44写道:
> > > >
> > > > Thank Godfrey for driving.
> > > >
> > > > Looks very good~ This will undoubtedly greatly enhance the various
> > batch
> > > > mode connectors.
> > > >
> > > > I left some comments:
> > > >
> > > > ## FileBasedStatisticsReportableDecodingFormat
> > > >
> > > > One concern I have is that we read the footer for each file, and this
> > may
> > > > be a bit costly in some cases. Is it possible for us to have some
> > > > hierarchical way, e.g.
> > > > - No statistics are collected for files by default.
> > > > - SIZE: Generate statistics based on file Size, get the size of the
> > file
> > > > only with access to the master of the FileSys

Re: [DISCUSS] FLIP-222: Support full query lifecycle statements in SQL client

2022-05-23 Thread godfrey he
tJobs, the
> >> same with Flink CLI. I think it’s okay to have non-SQL jobs listed in SQL
> >> client, because
> >> these jobs can be managed via SQL client too.
> >>
> >> WRT finished time, I think you’re right. Adding it to the FLIP. But I’m a
> >> bit afraid that the
> >> rows would be too long.
> >>
> >> WRT ‘DROP QUERY’,
> >>> What's the behavior for batch jobs and the non-running jobs?
> >>
> >>
> >> In general, the behavior would be aligned with Flink CLI. Triggering a
> >> savepoint for
> >> a non-running job would cause errors, and the error message would be
> >> printed to
> >> the SQL client. Triggering a savepoint for batch(unbounded) jobs in
> >> streaming
> >> execution mode would be the same with streaming jobs. However, for batch
> >> jobs in
> >> batch execution mode, I think there would be an error, because batch
> >> execution
> >> doesn’t support checkpoints currently (please correct me if I’m wrong).
> >>
> >> WRT ’SHOW SAVEPOINTS’, I’ve thought about it, but Flink clusterClient/
> >> jobClient doesn’t have such a functionality at the moment, neither do
> >> Flink CLI.
> >> Maybe we could make it a follow-up FLIP, which includes the modifications
> >> to
> >> clusterClient/jobClient and Flink CLI. WDYT?
> >>
> >> Best,
> >> Paul Lam
> >>
> >>> 2022年5月17日 20:34,godfrey he  写道:
> >>>
> >>> Godfrey
> >>
> >>
>


Re: Re: [DISCUSS] FLIP-218: Support SELECT clause in CREATE TABLE(CTAS)

2022-05-23 Thread godfrey he
Hi Jark,

> "Table#createTableAs(tablePath)" seems a
>little strange to me.

`Table#createTableAs` is a bit misleading, I learn to Table#saveAs(tablePath).

Best,
Godfrey

Jark Wu  于2022年5月18日周三 23:09写道:
>
> Hi Godfrey,
>
> Regarding Table API for CTAS, "Table#createTableAs(tablePath)" seems a
> little strange to me.
> Usually, the parameter after AS should be the query, but the query is in
> front of AS.
> I slightly prefer a method on TableEnvironment besides "createTable" (i.e.
> a special createTable with writing data).
>
> For example:
> void createTableAs(String path, TableDescriptor descriptor, Table query);
>
> Usage:
> tableEnv.createTableAs(
> "T1",
> TableDescriptor.forConnector("hive")
> .option("format", "parquet")
> .build(),
> query);
>
>
> Best,
> Jark
>
> On Wed, 18 May 2022 at 22:53, Jark Wu  wrote:
>
> > Hi Mang,
> >
> > Thanks for proposing this, CTAS is a very important API for batch users.
> >
> > I think the key problem of this FLIP is the ACID semantics of the CTAS
> > operation.
> > We care most about two parts of the semantics:
> > 1) Atomicity: the created table should be rolled back if the write is
> > failed.
> > 2) Isolation: the created table shouldn't be visible before the write is
> > successful (read uncommitted).
> >
> > From your investigation, it seems that:
> > - Flink (your FLIP): none of them.   ==> LEVEL-1
> > - Spark DataSource v1: is atomic (can roll back), but is not isolated. ==>
> > LEVEL-2
> > - Spark DataSource v2: guarantees both of them.  ==> LEVEL-3
> > - Hive MR: guarantees both of them. ==> LEVEL-3
> >
> > In order to support higher ACID semantics, I agree with Godfrey that we
> > need some hooks in JM
> > which can be called when the job is finished or failed/canceled. It might
> > look like
> > `StreamExecutionEnvironment#registerJobListener(JobListener)`,
> > but JobListener is called on the
> > client side. What we need is an interface called on the JM side, because
> > the job can be submitted in
> > detached mode.
> >
> > With this interface, we can easily support LEVEL-2 semantics by calling
> > `Catalog#dropTable` in the
> > `JobListener#onJobFailed`. We can also support LEVEL-3 by introducing
> > `StagingTableCatalog` like Spark,
> > calling `StagedTable#commitStagedChanges()` in `JobListener#onJobFinished`
> > and
> > calling StagedTable#abortStagedChanges() in `JobListener#onJobFailed`.
> >
> > Best,
> > Jark
> >
> >
> > On Wed, 18 May 2022 at 12:29, godfrey he  wrote:
> >
> >> Hi Mang,
> >>
> >> Thanks for driving this FLIP.
> >>
> >> Please follow the FLIP template[1] style, and the `Syntax ` is part of
> >> the `Public API Changes` section.
> >> ‘Program research’ and 'Implementation Plan' are part of the `Proposed
> >> Changes` section,
> >> or move ‘Program research’ to the appendix.
> >>
> >> > Providing methods that are used to execute CTAS for Table API users.
> >> We should introduce `createTable` in `Table` instead of
> >> `TableEnvironment`.
> >> Because all table operations are defined in `Table`, see:
> >> Table#executeInsert,
> >> Table#insertInto, etc.
> >> About the method name, I prefer to use `createTableAs`.
> >>
> >> > TableSink needs to provide the CleanUp API, developers implement as
> >> needed.
> >> I think it's hard for TableSink to implement a clean up operation. For
> >> file system sink,
> >> the data can be written to a temporary directory, but for key/value
> >> sinks, it's hard to
> >> remove the written keys, unless the sink records all written keys.
> >>
> >> > Do not do drop table operations in the framework, drop table is
> >> implemented in
> >> TableSink according to the needs of specific TableSink
> >> The TM process may crash at any time, and the drop operation will not
> >> be executed any more.
> >>
> >> How about we do the drop table operation and cleanup data action in the
> >> catalog?
> >> Where to execute the drop operation. one approach is in client, other is
> >> in JM.
> >> 1. in client: this requires the client to be alive until the job is
> >> finished and failed.
> >> 2. in JM: this requires the JM co

Re: [DISCUSS] FLIP-231: Introduce SupportStatisticReport to support reporting statistics from source connectors

2022-05-27 Thread godfrey he
Hi, everyone.

Thanks for all the inputs.
If there is no more feedback, I think we can start the vote next monday.

Best,
Godfrey

Martijn Visser  于2022年5月25日周三 19:46写道:
>
> Hi Godfrey,
>
> Thanks for creating the FLIP. I have no comments.
>
> Best regards,
>
> Martijn
>
>
> On Tue, 17 May 2022 at 04:52, Jingsong Li  wrote:
>
> > Hi Godfrey,
> >
> > Thanks for your reply.
> >
> > Sounds good to me.
> >
> > > I think we should also introduce a config option
> >
> > We can add this option to the FLIP. I prefer a option for
> > FileSystemConnector, maybe a enum.
> >
> > Best,
> > Jingsong
> >
> > On Tue, May 17, 2022 at 10:31 AM godfrey he  wrote:
> >
> > > Hi Jingsong,
> > >
> > > Thanks for the feedback.
> > >
> > >
> > > >One concern I have is that we read the footer for each file, and this
> > may
> > > >be a bit costly in some cases. Is it possible for us to have some
> > > > hierarchical way
> > > yes, if there are thousands of orc/parquet files, it may take a long
> > time.
> > > So we can introduce a config option to let the user choose the
> > > granularity of the statistics.
> > > But the SIZE will not be introduced, because the planner does not use
> > > the file size statistics now.
> > > We can introduce once file size statistics is introduce in the future.
> > > I think we should also introduce a config option to enable/disable
> > > SupportStatisticReport,
> > > because it's a heavy operation for some connectors in some cases.
> > >
> > > > is the filter pushdown already happening at
> > > > this time?
> > > That's a good point. Currently, the filter push down is after partition
> > > pruning
> > > to prevent the filter push down rule from consuming the partition
> > > predicates.
> > > The statistics will be set to unknown if filter is pushed down now.
> > > To combine them all, we can create an optimization program after filter
> > > push
> > > down program to collect the statistics. This could avoid collecting
> > > statistics multiple times.
> > >
> > >
> > > Best,
> > > Godfrey
> > >
> > > Jingsong Li  于2022年5月13日周五 22:44写道:
> > > >
> > > > Thank Godfrey for driving.
> > > >
> > > > Looks very good~ This will undoubtedly greatly enhance the various
> > batch
> > > > mode connectors.
> > > >
> > > > I left some comments:
> > > >
> > > > ## FileBasedStatisticsReportableDecodingFormat
> > > >
> > > > One concern I have is that we read the footer for each file, and this
> > may
> > > > be a bit costly in some cases. Is it possible for us to have some
> > > > hierarchical way, e.g.
> > > > - No statistics are collected for files by default.
> > > > - SIZE: Generate statistics based on file Size, get the size of the
> > file
> > > > only with access to the master of the FileSystem.
> > > > - DETAILED: Get the complete statistics by format, possibly by
> > accessing
> > > > the footer of the file.
> > > >
> > > > ## When use the statistics reported by connector
> > > >
> > > > > When partitions are pruned by PushPartitionIntoTableSourceScanRule,
> > the
> > > > statistics should also be updated.
> > > >
> > > > I understand that we definitely need to use reporter after the
> > partition
> > > > prune, but another question: is the filter pushdown already happening
> > at
> > > > this time?
> > > > Can we make sure that in the following three cases, both the filter
> > > > pushdown and the partition prune happen before the stats reporting.
> > > > - only partition prune happens
> > > > - only filter pushdown happens
> > > > - both filter pushdown and partition prune happen
> > > >
> > > > Best,
> > > > Jingsong
> > > >
> > > > On Fri, May 13, 2022 at 6:57 PM godfrey he 
> > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > I would like to open a discussion on FLIP-231:  Introduce
> > > > > SupportStatisticReport
> > > > > to support reporting statistics from source connectors.
> > > > >
> > > > > Statistics are on

Re: [DISCUSS] FLIP-231: Introduce SupportStatisticReport to support reporting statistics from source connectors

2022-05-31 Thread godfrey he
Hi Jark and Jing,

+1 to use "report" instead of "collect".

>  // only filter push down (*the description means
> partitionPushDownSpec == null but misses the case of
> partitionPushDownSpec != null*)

`if (partitionPushDownSpec != null && filterPushDownSpec == null)`
this branch is only consider that the partition is partition is pushed down,
but no filter is push down. The planner will collect the statistics
from catalog first,
and then try to collect the statistics from collectors if the catalog
statistics is unknown.

`else if (filterPushDownSpec != null)` this branch means  whether
the partitionPushDownSpec is null or not, the planner will collect
statistics from
collectors only, because the catalog do not support get statistics with filters.

`else if (collectStatEnabled
&& (table.getStatistic().getTableStats() == TableStats.UNKNOWN)
&& tableSource instanceof SupportStatisticReport)`
this branch means no partition and no filter are pushed down.

or we can change the pseudocode to:
 if (filterPushDownSpec != null) {
   // filter push down, no mater  partition push down or not
} else {
if (partitionPushDownSpec != null) {
// partition push down, while filter not
 } else {
 // no partition and filter push down
 }
}

Best,
Godfrey

Jing Ge  于2022年5月29日周日 08:17写道:
>
> Hi Godfrey,
>
> Thanks for driving this FLIP.  It looks really good! Looking forward to it!
>
> If I am not mistaken, partition pruning could also happen in the following
> pseudocode condition block:
>
> else if (filterPushDownSpec != null) {
> // only filter push down (*the description means
> partitionPushDownSpec == null but misses the case of
> partitionPushDownSpec != null*)
>
> // the catalog do not support get statistics with filters,
> // so only call reportStatistics method if needed
> if (collectStatEnabled && tableSource instanceof
> SupportStatisticReport) {
> newTableStat = ((SupportStatisticReport)
> tableSource).reportStatistics();
> }
>
>
> Best regards,
>
> Jing
>
>
> On Sat, May 28, 2022 at 5:09 PM Jark Wu  wrote:
>
> > Hi Godfrey,
> >
> > It seems that the "SupportStatisticReport" interface name and
> >  "table.optimizer.source.connect-statistics-enabled" option name is not
> > updated in the FLIP.
> >
> > Besides, in the terms of the option name, the meaning of
> > "source.statistics-type"
> > is not very straightforward and clean to me. Maybe
> > "source.report-statistics" = "none/all/file-size"
> > would be better.
> >
> > We can also change "table.optimizer.source.connect-statistics-enabled" to
> > "table.optimizer.source.report-statistics-enabled" for alignment and
> >  it's clear that one for fine-grained and one for coarse-grained.
> >
> >
> > Best,
> > Jark
> >
> > On Fri, 27 May 2022 at 22:58, godfrey he  wrote:
> >
> > > Hi, everyone.
> > >
> > > Thanks for all the inputs.
> > > If there is no more feedback, I think we can start the vote next monday.
> > >
> > > Best,
> > > Godfrey
> > >
> > > Martijn Visser  于2022年5月25日周三 19:46写道:
> > > >
> > > > Hi Godfrey,
> > > >
> > > > Thanks for creating the FLIP. I have no comments.
> > > >
> > > > Best regards,
> > > >
> > > > Martijn
> > > >
> > > >
> > > > On Tue, 17 May 2022 at 04:52, Jingsong Li 
> > > wrote:
> > > >
> > > > > Hi Godfrey,
> > > > >
> > > > > Thanks for your reply.
> > > > >
> > > > > Sounds good to me.
> > > > >
> > > > > > I think we should also introduce a config option
> > > > >
> > > > > We can add this option to the FLIP. I prefer a option for
> > > > > FileSystemConnector, maybe a enum.
> > > > >
> > > > > Best,
> > > > > Jingsong
> > > > >
> > > > > On Tue, May 17, 2022 at 10:31 AM godfrey he 
> > > wrote:
> > > > >
> > > > > > Hi Jingsong,
> > > > > >
> > > > > > Thanks for the feedback.
> > > > > >
> > > > > >
> > > > > > >One concern I have is that we read the footer for each file, and
> > > this
> > > > > may
> > > > > > >b

Re: [DISCUSS] FLIP-231: Introduce SupportStatisticReport to support reporting statistics from source connectors

2022-05-31 Thread godfrey he
Hi, Jing.

Thanks for the suggestion, I have updated the doc and
will continue to optimize the code in subsequent PR.

Best,
Godfrey

Jing Ge  于2022年6月1日周三 04:41写道:
>
> Hi Godfrey,
>
> Thanks for clarifying it. I personally prefer the new change you suggested.
>
> Would you please help to understand one more thing? The else if
> (filterPushDownSpec != null) branch is the only branch that doesn't have to
> check if the newTableStat has been calculated previously. The reason might
> be, after the filter has been pushed down to the table source,
> ((SupportStatisticReport) tableSource).reportStatistics() will return a new
> TableStats, which turns out the newTableStat has to be re-computed for each
> filter push down. In this case, it might be good to add it into the FLIP
> description. Otherwise, We could also add optimization for this branch to
> avoid re-computing the table statistics.
>
> NIT: There are many conditions in the nested if-else statements. In order
> to improve the readability (and maintainability in the future), we could
> consider moving up some common checks like collectStatEnabled or
> tableSource instanceof SupportStatisticReport, e.g.:
>
>  private LogicalTableScan collectStatistics(LogicalTableScan scan) {
>   ..
>  FlinkStatistic newStatistic = FlinkStatistic.builder()
> .statistic(table.getStatistic()) .tableStats(refreshTableStat(...))
> .build();
>  return new LogicalTableScan( scan.getCluster(), scan.getTraitSet(),
> scan.getHints(), table.copy(newStatistic));
> }
>
> private TableStats refreshTableStat(boolean collectStatEnabled,
> TableSourceTable table , PartitionPushDownSpec partitionPushDownSpec,
> FilterPushDownSpec filterPushDownSpec) {
>
>  if (!collectStatEnabled)   return null;
>
>  if (!(table.tableSource() instanceof SupportStatisticReport)) return
> null;
>
>  SupportStatisticReport tableSource =
> (SupportStatisticReport)table.tableSource();
>
>   if (filterPushDownSpec != null) {
>  // filter push down, no matter  partition push down or not
>  return  tableSource.reportStatistics();
>  } else {
>  if (partitionPushDownSpec != null) {
> // partition push down, while filter not
>  } else {
>  // no partition and filter push down
>  return table.getStatistic().getTableStats() ==
> TableStats.UNKNOWN ? tableSource.reportStatistics() :
> table.getStatistic().getTableStats();
> }
>  }
> }
>
> This just improved a little bit without introducing some kind of
> Action/Performer interface with many subclasses and factory class to get
> rid of some if-else statements, which could optionally be the next step
> provement in the future.
>
> Best regards,
> Jing
>
> On Tue, May 31, 2022 at 3:42 PM godfrey he  wrote:
>
> > Hi Jark and Jing,
> >
> > +1 to use "report" instead of "collect".
> >
> > >  // only filter push down (*the description means
> > > partitionPushDownSpec == null but misses the case of
> > > partitionPushDownSpec != null*)
> >
> > `if (partitionPushDownSpec != null && filterPushDownSpec == null)`
> > this branch is only consider that the partition is partition is pushed
> > down,
> > but no filter is push down. The planner will collect the statistics
> > from catalog first,
> > and then try to collect the statistics from collectors if the catalog
> > statistics is unknown.
> >
> > `else if (filterPushDownSpec != null)` this branch means  whether
> > the partitionPushDownSpec is null or not, the planner will collect
> > statistics from
> > collectors only, because the catalog do not support get statistics with
> > filters.
> >
> > `else if (collectStatEnabled
> > && (table.getStatistic().getTableStats() ==
> > TableStats.UNKNOWN)
> > && tableSource instanceof SupportStatisticReport)`
> > this branch means no partition and no filter are pushed down.
> >
> > or we can change the pseudocode to:
> >  if (filterPushDownSpec != null) {
> >// filter push down, no mater  partition push down or not
> > } else {
> > if (partitionPushDownSpec != null) {
> > // partition push down, while filter not
> >  } else {
> >  // no partition and filter push down
> >  }
> > }
> >
> > Best,
> > Godfrey
> >
> > Jing Ge  于2022年5月29日周日 08:17写道:
> > >
> > > Hi Godfrey,
> > >
> > > Thanks for driving this FLIP.  It looks really good! Looking forward to
> > it!
&

Re: [DISCUSS] FLIP-223: Support HiveServer2 Endpoint

2022-06-05 Thread godfrey he
Hi Shengkai,

Thanks for driving this.

I have a few comments:

Could you give a whole architecture about the Ecosystem of HiveServers
and the SqlGateway, such as JDBC driver, Beeline, etc.
Which is more clear for users.

> Considering the different users may have different requirements to connect to 
> different meta stores,
> they can use the DDL to register the HiveCatalog that satisfies their 
> requirements.
 Could you give some examples to explain it more?

> How To Use
Could you a complete example to describe an end-to-end case?

Is the streaming sql supported? What's the behavior if I submit streaming query
or I change the dialect to 'default'?

Best,
Godfrey

Shengkai Fang  于2022年6月1日周三 21:13写道:
>
> Hi, Jingsong.
>
> Thanks for your feedback.
>
> > I've read the FLIP and it's not quite clear what the specific unsupported
> items are
>
> Yes. I have added a section named Difference with HiveServer2 and list the
> difference between the SQL Gateway with HiveServer2 endpoint and
> HiveServer2.
>
> > Support multiple metastore clients in one gateway?
>
> Yes. It may cause class conflicts when using the different versions of Hive
> Catalog at the same time. I add a section named "How to use" to remind the
> users don't use HiveCatalog with different versions together.
>
> >  Hive versions and setup
>
> Considering the HiveServer2 endpoint binds to the HiveCatalog, we will not
> introduce a new module about the HiveServer2 endpoint. The current
> dependencies in the hive connector should be enough for the HiveServer2
> Endpoint except for the hive-service-RPC(it contains the HiveServer2
> interface). In this way, the hive connector jar will contain an endpoint. I
> add a section named "Merge HiveServer2 Endpoint into Hive Connector
> Module".
>
> For usage, the user can just add the hive connector jar into the classpath
> and use the sql-gateway.sh to start the SQL Gateway with the hiveserver2
> endpoint.  You can refer to the section "How to use" for more details.
>
> Best,
> Shengkai
>
> Jingsong Li  于2022年6月1日周三 15:04写道:
>
> > Hi Shengkai,
> >
> > Thanks for driving.
> >
> > I have a few comments:
> >
> > ## Unsupported features
> >
> > I've read the FLIP and it's not quite clear what the specific unsupported
> > items are?
> > - For example, security related, is it not supported.
> > - For example, is there a loss of precision for types
> > - For example, the FetchResults are not the same
> >
> > ## Support multiple metastore clients in one gateway?
> >
> > > During the setup, the HiveServer2 tires to load the config in the
> > hive-site.xml to initialize the Hive metastore client. In the Flink, we use
> > the Catalog interface to connect to the Hive Metastore, which is allowed to
> > communicate with different Hive Metastore[1]. Therefore, we allows the user
> > to specify the path of the hive-site.xml as the endpoint parameters, which
> > will used to create the default HiveCatalog in the Flink. Considering the
> > different users may have different requirements to connect to different
> > meta stores, they can use the DDL to register the HiveCatalog that
> > satisfies their requirements.
> >
> > I understand it is difficult. You really want to support?
> >
> > ## Hive versions and setup
> >
> > I saw jark also commented, but FLIP does not seem to have been modified,
> > how should the user setup, which jar to add, which hive metastore version
> > to support? How to setup to support?
> >
> > Best,
> > Jingsong
> >
> > On Tue, May 24, 2022 at 11:57 AM Shengkai Fang  wrote:
> >
> > > Hi, all.
> > >
> > > Considering we start to vote for FLIP-91 for a while, I think we can
> > > restart the discussion about the FLIP-223.
> > >
> > > I am glad that you can give some feedback about FLIP-223.
> > >
> > > Best,
> > > Shengkai
> > >
> > >
> > > Martijn Visser  于2022年5月6日周五 19:10写道:
> > >
> > > > Hi Shengkai,
> > > >
> > > > Thanks for clarifying.
> > > >
> > > > Best regards,
> > > >
> > > > Martijn
> > > >
> > > > On Fri, 6 May 2022 at 08:40, Shengkai Fang  wrote:
> > > >
> > > > > Hi Martijn.
> > > > >
> > > > > > So this implementation would not rely in any way on Hive, only on
> > > > Thrift?
> > > > >
> > > > > Yes.  The dependency is light. We also can just copy the iface file
> > > from
> > > > > the Hive repo and maintain by ourselves.
> > > > >
> > > > > Best,
> > > > > Shengkai
> > > > >
> > > > > Martijn Visser  于2022年5月4日周三 21:44写道:
> > > > >
> > > > > > Hi Shengkai,
> > > > > >
> > > > > > > Actually we will only rely on the API in the Hive, which only
> > > > contains
> > > > > > the thrift file and the generated code
> > > > > >
> > > > > > So this implementation would not rely in any way on Hive, only on
> > > > Thrift?
> > > > > >
> > > > > > Best regards,
> > > > > >
> > > > > > Martijn Visser
> > > > > > https://twitter.com/MartijnVisser82
> > > > > > https://github.com/MartijnVisser
> > > > > >
> > > > > >
> > > > > > On Fri, 29 Apr 2022 at 05:16, Shengkai Fang 
> > > wrote:
> > > > > 

Re: [DISCUSS] FLIP-231: Introduce SupportStatisticReport to support reporting statistics from source connectors

2022-06-06 Thread godfrey he
Hi, everyone.

Thanks for all the inputs.
Since there is no feedback any more, I will start the vote tomorrow.

Best,
Godfrey

godfrey he  于2022年6月1日周三 13:30写道:
>
> Hi, Jing.
>
> Thanks for the suggestion, I have updated the doc and
> will continue to optimize the code in subsequent PR.
>
> Best,
> Godfrey
>
> Jing Ge  于2022年6月1日周三 04:41写道:
> >
> > Hi Godfrey,
> >
> > Thanks for clarifying it. I personally prefer the new change you suggested.
> >
> > Would you please help to understand one more thing? The else if
> > (filterPushDownSpec != null) branch is the only branch that doesn't have to
> > check if the newTableStat has been calculated previously. The reason might
> > be, after the filter has been pushed down to the table source,
> > ((SupportStatisticReport) tableSource).reportStatistics() will return a new
> > TableStats, which turns out the newTableStat has to be re-computed for each
> > filter push down. In this case, it might be good to add it into the FLIP
> > description. Otherwise, We could also add optimization for this branch to
> > avoid re-computing the table statistics.
> >
> > NIT: There are many conditions in the nested if-else statements. In order
> > to improve the readability (and maintainability in the future), we could
> > consider moving up some common checks like collectStatEnabled or
> > tableSource instanceof SupportStatisticReport, e.g.:
> >
> >  private LogicalTableScan collectStatistics(LogicalTableScan scan) {
> >   ..
> >  FlinkStatistic newStatistic = FlinkStatistic.builder()
> > .statistic(table.getStatistic()) .tableStats(refreshTableStat(...))
> > .build();
> >  return new LogicalTableScan( scan.getCluster(), scan.getTraitSet(),
> > scan.getHints(), table.copy(newStatistic));
> > }
> >
> > private TableStats refreshTableStat(boolean collectStatEnabled,
> > TableSourceTable table , PartitionPushDownSpec partitionPushDownSpec,
> > FilterPushDownSpec filterPushDownSpec) {
> >
> >  if (!collectStatEnabled)   return null;
> >
> >  if (!(table.tableSource() instanceof SupportStatisticReport)) return
> > null;
> >
> >  SupportStatisticReport tableSource =
> > (SupportStatisticReport)table.tableSource();
> >
> >   if (filterPushDownSpec != null) {
> >  // filter push down, no matter  partition push down or not
> >  return  tableSource.reportStatistics();
> >  } else {
> >  if (partitionPushDownSpec != null) {
> > // partition push down, while filter not
> >  } else {
> >  // no partition and filter push down
> >  return table.getStatistic().getTableStats() ==
> > TableStats.UNKNOWN ? tableSource.reportStatistics() :
> > table.getStatistic().getTableStats();
> >     }
> >  }
> > }
> >
> > This just improved a little bit without introducing some kind of
> > Action/Performer interface with many subclasses and factory class to get
> > rid of some if-else statements, which could optionally be the next step
> > provement in the future.
> >
> > Best regards,
> > Jing
> >
> > On Tue, May 31, 2022 at 3:42 PM godfrey he  wrote:
> >
> > > Hi Jark and Jing,
> > >
> > > +1 to use "report" instead of "collect".
> > >
> > > >  // only filter push down (*the description means
> > > > partitionPushDownSpec == null but misses the case of
> > > > partitionPushDownSpec != null*)
> > >
> > > `if (partitionPushDownSpec != null && filterPushDownSpec == null)`
> > > this branch is only consider that the partition is partition is pushed
> > > down,
> > > but no filter is push down. The planner will collect the statistics
> > > from catalog first,
> > > and then try to collect the statistics from collectors if the catalog
> > > statistics is unknown.
> > >
> > > `else if (filterPushDownSpec != null)` this branch means  whether
> > > the partitionPushDownSpec is null or not, the planner will collect
> > > statistics from
> > > collectors only, because the catalog do not support get statistics with
> > > filters.
> > >
> > > `else if (collectStatEnabled
> > > && (table.getStatistic().getTableStats() ==
> > > TableStats.UNKNOWN)
> > > && tableSource instanceof SupportStatisticReport)`
> > > this branch means no partition and no filter are pushed dow

Re: [DISCUSS] FLIP-223: Support HiveServer2 Endpoint

2022-06-06 Thread godfrey he
Hi, Shengkai.

Thanks for the update, LGTM now.

Best,
Godfrey


Shengkai Fang  于2022年6月6日周一 16:47写道:
>
> Hi. Godfrey.
>
> Nice to hear the comments from you.
>
> > Could you give a whole architecture about the Ecosystem of HiveServers
> > and the SqlGateway, such as JDBC driver, Beeline, etc.
> > Which is more clear for users.
>
> Yes. I have updated the FLIP and added the architecture of the Gateway with
> the HiveServer2 endpoint.
>
> > How To Use
> >> Could you give a complete example to describe an end-to-end case?
>
> Yes. I have updated the FLIP. The beeline users can just use the connect
> command to connect to the SQLGateway with the HiveServer2 endpoint.
> For example, users just inputs "!connect
> jdbc:hive2://:/;auth=noSasl
> hiveuser pass" into the terminal to connect to the SQLGateway.
>
> > Is the streaming SQL supported? What's the behavior if I submit a
> streaming query or I change the dialect to 'default'?
> Yes. We don't limit the usage here. Users can switch to the streaming mode
> or use the default dialect.  But we don't suggest users use the hive
> dialect in the streaming mode. As far as I know, it has some problems that
> are not fixed yet, e.g. you may get errors for SQL that works in the batch
> mode. I added a section to mention this.
>
> > Considering the different users may have different requirements to
> connect to different meta stores,
> > they can use the DDL to register the HiveCatalog that satisfies their
> requirements.
> >> Could you give some examples to explain it more?
>
> Hive supports setting multiple metastore addresses via the config option
> "hive.metastore.urls". Here I just mean users can switch to connect to
> different metastore instances using the CREATE CATALOG DDL. I updated the
> FLIP to make it more clear.
>
> Best,
> Shengkai
>
> godfrey he  于2022年6月6日周一 13:45写道:
>
> > Hi Shengkai,
> >
> > Thanks for driving this.
> >
> > I have a few comments:
> >
> > Could you give a whole architecture about the Ecosystem of HiveServers
> > and the SqlGateway, such as JDBC driver, Beeline, etc.
> > Which is more clear for users.
> >
> > > Considering the different users may have different requirements to
> > connect to different meta stores,
> > > they can use the DDL to register the HiveCatalog that satisfies their
> > requirements.
> >  Could you give some examples to explain it more?
> >
> > > How To Use
> > Could you a complete example to describe an end-to-end case?
> >
> > Is the streaming sql supported? What's the behavior if I submit streaming
> > query
> > or I change the dialect to 'default'?
> >
> > Best,
> > Godfrey
> >
> > Shengkai Fang  于2022年6月1日周三 21:13写道:
> > >
> > > Hi, Jingsong.
> > >
> > > Thanks for your feedback.
> > >
> > > > I've read the FLIP and it's not quite clear what the specific
> > unsupported
> > > items are
> > >
> > > Yes. I have added a section named Difference with HiveServer2 and list
> > the
> > > difference between the SQL Gateway with HiveServer2 endpoint and
> > > HiveServer2.
> > >
> > > > Support multiple metastore clients in one gateway?
> > >
> > > Yes. It may cause class conflicts when using the different versions of
> > Hive
> > > Catalog at the same time. I add a section named "How to use" to remind
> > the
> > > users don't use HiveCatalog with different versions together.
> > >
> > > >  Hive versions and setup
> > >
> > > Considering the HiveServer2 endpoint binds to the HiveCatalog, we will
> > not
> > > introduce a new module about the HiveServer2 endpoint. The current
> > > dependencies in the hive connector should be enough for the HiveServer2
> > > Endpoint except for the hive-service-RPC(it contains the HiveServer2
> > > interface). In this way, the hive connector jar will contain an
> > endpoint. I
> > > add a section named "Merge HiveServer2 Endpoint into Hive Connector
> > > Module".
> > >
> > > For usage, the user can just add the hive connector jar into the
> > classpath
> > > and use the sql-gateway.sh to start the SQL Gateway with the hiveserver2
> > > endpoint.  You can refer to the section "How to use" for more details.
> > >
> > > Best,
> > > Shengkai
> > >
> > > Jingsong Li  于2022年6月1日周三 15:04

[VOTE] FLIP-231: Introduce SupportStatisticReport to support reporting statistics from source connectors

2022-06-06 Thread godfrey he
Hi everyone,

Thanks for all the feedback so far. Based on the discussion[1] we seem
to have consensus, so I would like to start a vote on FLIP-231 for
which the FLIP has now also been updated[2].

The vote will last for at least 72 hours (Jun 10th 12:00 GMT) unless
there is an objection or insufficient votes.

[1] https://lists.apache.org/thread/88kxk7lh8bq2s2c2qrf06f3pnf9fkxj2
[2] 
https://cwiki.apache.org/confluence/pages/resumedraft.action?draftId=211883860&draftShareId=eda17eaa-43f9-4dc1-9a7d-3a9b5a4bae00&;

Best,
Godfrey


Re: [VOTE] FLIP-234: Support Retryable Lookup Join To Solve Delayed Updates Issue In External Systems

2022-06-08 Thread godfrey he
+1

Best,
Godfrey

Jingsong Li  于2022年6月9日周四 10:26写道:
>
> +1 (binding)
>
> Best,
> Jingsong
>
> On Tue, Jun 7, 2022 at 5:21 PM Jark Wu  wrote:
> >
> > +1 (binding)
> >
> > Best,
> > Jark
> >
> > On Tue, 7 Jun 2022 at 12:17, Lincoln Lee  wrote:
> >
> > > Dear Flink developers,
> > >
> > > Thanks for all your feedback for FLIP-234: Support Retryable Lookup Join 
> > > To
> > > Solve Delayed Updates Issue In External Systems[1] on the discussion
> > > thread[2].
> > >
> > > I'd like to start a vote for it. The vote will be open for at least 72
> > > hours unless there is an objection or not enough votes.
> > >
> > > [1]
> > >
> > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems
> > > [2] https://lists.apache.org/thread/9k1sl2519kh2n3yttwqc00p07xdfns3h
> > >
> > > Best,
> > > Lincoln Lee
> > >


Re: [VOTE] FLIP-223: Support HiveServer2 Endpoint

2022-06-08 Thread godfrey he
+1

Best,
Godfrey

Jark Wu  于2022年6月7日周二 17:21写道:
>
> +1 (binding)
>
> Best,
> Jark
>
> On Tue, 7 Jun 2022 at 13:32, Shengkai Fang  wrote:
>
> > Hi, everyone.
> >
> > Thanks for all feedback for FLIP-223: Support HiveServer2 Endpoint[1] on
> > the discussion thread[2]. I'd like to start a vote for it. The vote will be
> > open for at least 72 hours unless there is an objection or not enough
> > votes.
> >
> > Best,
> > Shengkai
> >
> >
> > [1]
> >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-223%3A+Support+HiveServer2+Endpoint
> > [2] https://lists.apache.org/thread/9r1j7ho2m8zbqy3tl7vvj9gnocggwr6x
> >


Re: [DISCUSS] FLIP-222: Support full query lifecycle statements in SQL client

2022-06-08 Thread godfrey he
t;>>> Speaking of ETL DAG, we might want to see the lineage. Is it possible to 
>>>> support syntax like:
>>>>
>>>> SHOW JOBTREE   // shows the downstream DAG from the given job_id
>>>> SHOW JOBTREE  FULL // shows the whole DAG that contains the given 
>>>> job_id
>>>> SHOW JOBTREES // shows all DAGs
>>>> SHOW ANCIENTS  // shows all parents of the given job_id
>>>>
>>>> 3)
>>>> Could we also support Savepoint housekeeping syntax? We ran into this 
>>>> issue that a lot of savepoints have been created by customers (via their 
>>>> apps). It will take extra (hacking) effort to clean it.
>>>>
>>>> RELEASE SAVEPOINT ALL
>>>>
>>>> Best regards,
>>>> Jing
>>>>
>>>> On Tue, Jun 7, 2022 at 2:35 PM Martijn Visser  
>>>> wrote:
>>>>>
>>>>> Hi Paul,
>>>>>
>>>>> I'm still doubting the keyword for the SQL applications. SHOW QUERIES 
>>>>> could
>>>>> imply that this will actually show the query, but we're returning IDs of
>>>>> the running application. At first I was also not very much in favour of
>>>>> SHOW JOBS since I prefer calling it 'Flink applications' and not 'Flink
>>>>> jobs', but the glossary [1] made me reconsider. I would +1 SHOW/STOP JOBS
>>>>>
>>>>> Also +1 for the CREATE/SHOW/DROP SAVEPOINT syntax.
>>>>>
>>>>> Best regards,
>>>>>
>>>>> Martijn
>>>>>
>>>>> [1]
>>>>> https://nightlies.apache.org/flink/flink-docs-master/docs/concepts/glossary
>>>>>
>>>>> Op za 4 jun. 2022 om 10:38 schreef Paul Lam :
>>>>>
>>>>> > Hi Godfrey,
>>>>> >
>>>>> > Sorry for the late reply, I was on vacation.
>>>>> >
>>>>> > It looks like we have a variety of preferences on the syntax, how about 
>>>>> > we
>>>>> > choose the most acceptable one?
>>>>> >
>>>>> > WRT keyword for SQL jobs, we use JOBS, thus the statements related to 
>>>>> > jobs
>>>>> > would be:
>>>>> >
>>>>> > - SHOW JOBS
>>>>> > - STOP JOBS  (with options `table.job.stop-with-savepoint` and
>>>>> > `table.job.stop-with-drain`)
>>>>> >
>>>>> > WRT savepoint for SQL jobs, we use the `CREATE/DROP` pattern with `FOR
>>>>> > JOB`:
>>>>> >
>>>>> > - CREATE SAVEPOINT  FOR JOB 
>>>>> > - SHOW SAVEPOINTS FOR JOB  (show savepoints the current job
>>>>> > manager remembers)
>>>>> > - DROP SAVEPOINT 
>>>>> >
>>>>> > cc @Jark @ShengKai @Martijn @Timo .
>>>>> >
>>>>> > Best,
>>>>> > Paul Lam
>>>>> >
>>>>> >
>>>>> > godfrey he  于2022年5月23日周一 21:34写道:
>>>>> >
>>>>> >> Hi Paul,
>>>>> >>
>>>>> >> Thanks for the update.
>>>>> >>
>>>>> >> >'SHOW QUERIES' lists all jobs in the cluster, no limit on APIs
>>>>> >> (DataStream or SQL) or
>>>>> >> clients (SQL client or CLI).
>>>>> >>
>>>>> >> Is DataStream job a QUERY? I think not.
>>>>> >> For a QUERY, the most important concept is the statement. But the
>>>>> >> result does not contain this info.
>>>>> >> If we need to contain all jobs in the cluster, I think the name should
>>>>> >> be JOB or PIPELINE.
>>>>> >> I learn to SHOW PIPELINES and STOP PIPELINE [IF RUNNING] id.
>>>>> >>
>>>>> >> > SHOW SAVEPOINTS
>>>>> >> To list the savepoint for a specific job, we need to specify a
>>>>> >> specific pipeline,
>>>>> >> the syntax should be SHOW SAVEPOINTS FOR PIPELINE id
>>>>> >>
>>>>> >> Best,
>>>>> >> Godfrey
>>>>> >>
>>>>> >> Paul Lam  于2022年5月20日周五 11:25写道:
>>>>> >> >
>>>>> >> > Hi Jark,
>>>>> >> >
>>>&

[RESULT][VOTE] FLIP-231: Introduce SupportsStatisticReport to support reporting statistics from source connectors

2022-06-09 Thread godfrey he
Hi, everyone.

FLIP-231: Introduce SupportsStatisticReport to support reporting
statistics from source connectors[1] has been accepted.

There are 5 binding votes, 1 non-binding votes[2].
- Jing Ge(non-binding)
- Jark Wu(binding)
- Jingsong Li(binding)
- Martijn Visser(binding)
- Jing Zhang(binding)
- Leonard Xu(binding)

None against.

Thanks again for every one who concerns on this FLIP.


[1]
https://cwiki.apache.org/confluence/pages/resumedraft.action?draftId=211883860&draftShareId=eda17eaa-43f9-4dc1-9a7d-3a9b5a4bae00&;
[2] https://lists.apache.org/thread/j1mqblpbp60hgwg2fnhp44cktfp76zd2


Best,
Godfrey


[DISCUSS] FLIP-240: Introduce "ANALYZE TABLE" Syntax

2022-06-10 Thread godfrey he
Hi all,

I would like to open a discussion on FLIP-240:  Introduce "ANALYZE
TABLE" Syntax.

As FLIP-231 mentioned, statistics are one of the most important inputs
to the optimizer. Accurate and complete statistics allows the
optimizer to be more powerful. "ANALYZE TABLE" syntax is a very common
but effective approach to gather statistics, which is already
introduced by many compute engines and databases.

The main purpose of  discussion is to introduce "ANALYZE TABLE" syntax
for Flink sql.

You can find more details in FLIP-240 document[1]. Looking forward to
your feedback.

[1] https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=217386481
[2] POC: https://github.com/godfreyhe/flink/tree/FLIP-240


Best,
Godfrey


Re: [DISCUSS] FLIP-240: Introduce "ANALYZE TABLE" Syntax

2022-06-12 Thread godfrey he
Hi cao,

Thanks for the feedback.
AFAK, unlike databases'  behavior, the statistics will not collected
automatically
when writing data for many big data compute engines.
FLIP-231[1] has introduced SupportsStatisticsReport interface which the planner
will collect the statistics from connector when statistics from
catalog is unknown.
But the statistics from connector usually has partial information.
Typically, the number
of distinct values will not included.
`ANALYZE TABLE` provides a way of updating complete statistical
information manually.
This is also provided by many big data compute engines and databases.

[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-231%3A+Introduce+SupportsStatisticReport+to+support+reporting+statistics+from+source+connectors

Best,
Godfrey

cao zou  于2022年6月10日周五 16:49写道:
>
> Hi godfrey, Thanks for driving this meaningful topic.
> I think statistics are essential and meaningful for the optimizer, I'm just
> wondering which situation is needed. From the user side, the optimizer
> should be executed by the framework, maybe they do not want to consider too
> much about it. Could you share more situations about using 'ANALYZE TABLE'
> from the user side?
>
> nit: There maybe exists a mistake in Examples#partition table
> the partition info should be
>
> Partition1: (ds='2022-06-01', hr=1)
>
> Partition2: (ds='2022-06-01', hr=2)
>
> Partition3: (ds='2022-06-02', hr=1)
>
> Partition4: (ds='2022-06-02', hr=2)
>
> best
>  zoucao
>
>
> godfrey he  于2022年6月10日周五 15:54写道:
>
> > Hi all,
> >
> > I would like to open a discussion on FLIP-240:  Introduce "ANALYZE
> > TABLE" Syntax.
> >
> > As FLIP-231 mentioned, statistics are one of the most important inputs
> > to the optimizer. Accurate and complete statistics allows the
> > optimizer to be more powerful. "ANALYZE TABLE" syntax is a very common
> > but effective approach to gather statistics, which is already
> > introduced by many compute engines and databases.
> >
> > The main purpose of  discussion is to introduce "ANALYZE TABLE" syntax
> > for Flink sql.
> >
> > You can find more details in FLIP-240 document[1]. Looking forward to
> > your feedback.
> >
> > [1]
> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=217386481
> > [2] POC: https://github.com/godfreyhe/flink/tree/FLIP-240
> >
> >
> > Best,
> > Godfrey
> >


Re: [DISCUSS] FLIP-240: Introduce "ANALYZE TABLE" Syntax

2022-06-12 Thread godfrey he
Hi Ingo,

Thanks for the inputs.

I think converting `ANALYZE TABLE` to `SELECT` statement is
more generic approach. Because query plan optimization is more generic,
 we can provide more optimization rules to optimize not only `SELECT` statement
converted from `ANALYZE TABLE` but also the `SELECT` statement written by users.

> JDBC connector can get a row count estimate without performing a
> SELECT COUNT(1)
To optimize such cases, we can implement a rule to push aggregate into
table source.
Currently, there is a similar rule: SupportsAggregatePushDown, which
supports only pushing
local aggregate into source now.


Best,
Godfrey

Ingo Bürk  于2022年6月10日周五 17:15写道:
>
> Hi Godfrey,
>
> compared to the solution proposed in the FLIP (using a SELECT
> statement), I wonder if you have considered adding APIs to catalogs /
> connectors to perform this task as an alternative?
> I could imagine that for many connectors, statistics could be
> implemented in a less expensive way by leveraging the underlying system
> (e.g. a JDBC connector can get a row count estimate without performing a
> SELECT COUNT(1)).
>
>
> Best
> Ingo
>
>
> On 10.06.22 09:53, godfrey he wrote:
> > Hi all,
> >
> > I would like to open a discussion on FLIP-240:  Introduce "ANALYZE
> > TABLE" Syntax.
> >
> > As FLIP-231 mentioned, statistics are one of the most important inputs
> > to the optimizer. Accurate and complete statistics allows the
> > optimizer to be more powerful. "ANALYZE TABLE" syntax is a very common
> > but effective approach to gather statistics, which is already
> > introduced by many compute engines and databases.
> >
> > The main purpose of  discussion is to introduce "ANALYZE TABLE" syntax
> > for Flink sql.
> >
> > You can find more details in FLIP-240 document[1]. Looking forward to
> > your feedback.
> >
> > [1] 
> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=217386481
> > [2] POC: https://github.com/godfreyhe/flink/tree/FLIP-240
> >
> >
> > Best,
> > Godfrey


Re: [DISCUSS] FLIP-240: Introduce "ANALYZE TABLE" Syntax

2022-06-12 Thread godfrey he
Hi Ingo,

The semantics does not distinguish batch and streaming,
It works for both batch and streaming, but the result of
unbounded sources is meaningless.
Currently, I throw exception for streaming mode,
and we can support streaming mode with bounded source
in the future.

Best,
Godfrey

Ingo Bürk  于2022年6月13日周一 14:17写道:
>
> Hi Godfrey,
>
> thank you for the explanation. A SELECT is definitely more generic and
> will work for all connectors automatically. As such I think it's a good
> baseline solution regardless.
>
> We can also think about allowing connector-specific optimizations in the
> future, but I do like your idea of letting the optimizer rules perform a
> lot of the work here already by leveraging existing optimizations.
> Similarly things like non-null counts of non-nullable columns would (or
> at least could) be handled by the optimizer rules already.
>
> So as far as that point goes, +1 to the generic approach.
>
> One more point, though: In general we should avoid supporting features
> only in specific modes as it breaks the unification promise. Given that
> ANALYZE is a manual and completely optional operation I'm OK with doing
> that here in principle. However, I wonder what will happen in the
> streaming / unbounded case. Do you plan to throw an error? Or do we
> complete the command as successful but without doing anything?
>
>
> Best
> Ingo
>
> On 13.06.22 05:50, godfrey he wrote:
> > Hi Ingo,
> >
> > Thanks for the inputs.
> >
> > I think converting `ANALYZE TABLE` to `SELECT` statement is
> > more generic approach. Because query plan optimization is more generic,
> >   we can provide more optimization rules to optimize not only `SELECT` 
> > statement
> > converted from `ANALYZE TABLE` but also the `SELECT` statement written by 
> > users.
> >
> >> JDBC connector can get a row count estimate without performing a
> >> SELECT COUNT(1)
> > To optimize such cases, we can implement a rule to push aggregate into
> > table source.
> > Currently, there is a similar rule: SupportsAggregatePushDown, which
> > supports only pushing
> > local aggregate into source now.
> >
> >
> > Best,
> > Godfrey
> >
> > Ingo Bürk  于2022年6月10日周五 17:15写道:
> >>
> >> Hi Godfrey,
> >>
> >> compared to the solution proposed in the FLIP (using a SELECT
> >> statement), I wonder if you have considered adding APIs to catalogs /
> >> connectors to perform this task as an alternative?
> >> I could imagine that for many connectors, statistics could be
> >> implemented in a less expensive way by leveraging the underlying system
> >> (e.g. a JDBC connector can get a row count estimate without performing a
> >> SELECT COUNT(1)).
> >>
> >>
> >> Best
> >> Ingo
> >>
> >>
> >> On 10.06.22 09:53, godfrey he wrote:
> >>> Hi all,
> >>>
> >>> I would like to open a discussion on FLIP-240:  Introduce "ANALYZE
> >>> TABLE" Syntax.
> >>>
> >>> As FLIP-231 mentioned, statistics are one of the most important inputs
> >>> to the optimizer. Accurate and complete statistics allows the
> >>> optimizer to be more powerful. "ANALYZE TABLE" syntax is a very common
> >>> but effective approach to gather statistics, which is already
> >>> introduced by many compute engines and databases.
> >>>
> >>> The main purpose of  discussion is to introduce "ANALYZE TABLE" syntax
> >>> for Flink sql.
> >>>
> >>> You can find more details in FLIP-240 document[1]. Looking forward to
> >>> your feedback.
> >>>
> >>> [1] 
> >>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=217386481
> >>> [2] POC: https://github.com/godfreyhe/flink/tree/FLIP-240
> >>>
> >>>
> >>> Best,
> >>> Godfrey


Re: [ANNOUNCE] New Apache Flink PMC Member - Jingsong Lee

2022-06-13 Thread godfrey he
Congratulations, Jingsong!

Best,
Godfrey

Shuo Cheng  于2022年6月13日周一 16:43写道:
>
> Congratulations, Jingsong!
>
> On 6/13/22, Paul Lam  wrote:
> > Congrats, Jingsong! Well deserved!
> >
> > Best,
> > Paul Lam
> >
> >> 2022年6月13日 16:31,Lincoln Lee  写道:
> >>
> >> Congratulations, Jingsong!
> >>
> >> Best,
> >> Lincoln Lee
> >>
> >>
> >> Jark Wu  于2022年6月13日周一 16:29写道:
> >>
> >>> Congrats, Jingsong!
> >>>
> >>> Cheers,
> >>> Jark
> >>>
> >>> On Mon, 13 Jun 2022 at 16:16, Jiangang Liu 
> >>> wrote:
> >>>
>  Congratulations, Jingsong!
> 
>  Best,
>  Jiangang Liu
> 
>  Martijn Visser  于2022年6月13日周一 16:06写道:
> 
> > Like everyone has mentioned, this is very well deserved.
> >>> Congratulations!
> >
> > Op ma 13 jun. 2022 om 09:57 schreef Benchao Li :
> >
> >> Congratulations, Jingsong!  Well deserved.
> >>
> >> Rui Fan <1996fan...@gmail.com> 于2022年6月13日周一 15:53写道:
> >>
> >>> Congratulations, Jingsong!
> >>>
> >>> Best,
> >>> Rui Fan
> >>>
> >>> On Mon, Jun 13, 2022 at 3:40 PM LuNing Wang  
> >> wrote:
> >>>
>  Congratulations, Jingsong!
> 
>  Best,
>  LuNing Wang
> 
>  Ingo Bürk  于2022年6月13日周一 15:36写道:
> 
> > Congrats, Jingsong!
> >
> > On 13.06.22 08:58, Becket Qin wrote:
> >> Hi all,
> >>
> >> I'm very happy to announce that Jingsong Lee has joined the
>  Flink
> >>> PMC!
> >>
> >> Jingsong became a Flink committer in Feb 2020 and has been
> >>> continuously
> >> contributing to the project since then, mainly in Flink SQL.
> >>> He
> > has
>  been
> >> quite active in the mailing list, fixing bugs, helping
>  verifying
> > releases,
> >> reviewing patches and FLIPs. Jingsong is also devoted to
>  pushing
> >>> Flink
> > SQL
> >> to new use cases. He spent a lot of time in implementing the
> > Flink
> >> connectors for Apache Iceberg. Jingsong is also the primary
> > driver
>  behind
> >> the effort of flink-table-store, which aims to provide a
> >> stream-batch
> >> unified storage for Flink dynamic tables.
> >>
> >> Congratulations and welcome, Jingsong!
> >>
> >> Cheers,
> >>
> >> Jiangjie (Becket) Qin
> >> (On behalf of the Apache Flink PMC)
> >>
> >
> 
> >>>
> >>
> >>
> >> --
> >>
> >> Best,
> >> Benchao Li
> >>
> >
> 
> >>>
> >
> >


Re: [DISCUSS] FLIP-190: Support Version Upgrades for Table API & SQL Programs

2021-11-22 Thread godfrey he
Hi Timo,

Thanks for driving this discussion, the sql job's upgrading compatibility always
is a big pain point. In the last version we completed some work, this FLIP will
make the whole upgrade story possible.

I have a few comments:
1)  "EXPLAIN PLAN EXECUTE STATEMENT SET BEGIN ... END" is missing.
It's better we can add this syntax and make the API more complete.

2) about the annotation of the ExecNode, it's hard to maintain the supported
versions for "supportedPlanChanges" and "supportedSavepointChanges".
Imagine that, when we are upgrading Flink from 1.15 to 1.16, most ExecNodes are
not changed (high probability scenarios), but we need add supported
version (1.16)
to most (even all) ExecNodes manually. Considering that the supported
versions are
continuous, we only need annotate the start version (when the ExecNode
is introduced)
and the end version (when the change is compatible and a new ExecNode
with new version
needs to be introduced) for supportedPlanChanges and supportedSavepointChanges.
e.g. supportedSavepointChanges ={start=1_15, end=1_16}

Best,
Godfrey

wenlong.lwl  于2021年11月22日周一 下午9:07写道:
>
> Hi, Timo, thanks for driving the discussion and the preparation on the
> FLIP. This is a pain point of Flink SQL complaining by our users badly. I
> have  seen many cases where our users suffer while trying to upgrade the
> flink  version in order to take advantage of the bug fixes and performance
> improvements on the new version. It often takes a long time verifying the
> new plan,  reoptimizing the config, recomputing the state,  waiting for a
> safe point to make the new job active in production, etc. There are many
> times that new problems show up in upgrading.
>
> I have a question on COMPILE AND EXECUTE. It doesn't look so good that we
> just execute the plan and ignore the statement when the plan already
> exists, but the plan and SQL are not matched. The result would be quite
> confusing if we still execute the plan directly, we may need to add a
> validation. Personally I would prefer not to provide such a shortcut, let
> users use  COMPILE PLAN IF NOT EXISTS and EXECUTE explicitly, which can be
> understood by new users even without inferring the docs.
>
> Best,
> Wenlong


Re: [VOTE][FLIP-195] Improve the name and structure of vertex and operator name for job

2021-11-23 Thread godfrey he
+1 (binding)

Best,
Godfrey

Jark Wu  于2021年11月24日周三 下午12:02写道:
>
> +1 (binding)
>
> Btw, @JingZhang I think your vote can be counted into binding now.
>
> Best,
> Jark
>
> On Tue, 23 Nov 2021 at 20:19, Jing Zhang  wrote:
>
> > +1 (non-binding)
> >
> > Best,
> > Jing Zhang
> >
> > Martijn Visser  于2021年11月23日周二 下午7:42写道:
> >
> > > +1 (non-binding)
> > >
> > > On Tue, 23 Nov 2021 at 12:13, Aitozi  wrote:
> > >
> > > > +1 (non-binding)
> > > >
> > > > Best,
> > > > Aitozi
> > > >
> > > > wenlong.lwl  于2021年11月23日周二 下午4:00写道:
> > > >
> > > > > Hi everyone,
> > > > >
> > > > > Based on the discussion[1], we seem to have consensus, so I would
> > like
> > > to
> > > > > start a vote on FLIP-195 [2].
> > > > > Thanks for all of your feedback.
> > > > >
> > > > > The vote will last for at least 72 hours (Nov 26th 16:00 GMT) unless
> > > > > there is an objection or insufficient votes.
> > > > >
> > > > > [1] https://lists.apache.org/thread/kvdxr8db0l5s6wk7hwlt0go5fms99b8t
> > > > > [2]
> > > > >
> > > > >
> > > >
> > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-195%3A+Improve+the+name+and+structure+of+vertex+and+operator+name+for+job
> > > > >
> > > > > Best,
> > > > > Wenlong Lyu
> > > > >
> > > >
> > >
> >


Re: [VOTE] FLIP-188 Introduce Built-in Dynamic Table Storage

2021-11-30 Thread godfrey he
+1 (binding)

Best,
Godfrey

Jark Wu  于2021年11月30日周二 下午5:47写道:

>
> Thanks for the great discussion and updating.
> Still +1 from my side.
>
> Best,
> Jark
>
> On Tue, 30 Nov 2021 at 17:27, Kurt Young  wrote:
>
> > +1 from my side.
> >
> > Best,
> > Kurt
> >
> >
> > On Tue, Nov 30, 2021 at 5:12 PM Jingsong Li 
> > wrote:
> >
> > > Hi everyone,
> > >
> > > Many thanks to Stephan and Timo, this makes the design of FLIP much
> > > clearer and more reliable.
> > >
> > > I request that you can take another look at the updated FLIP and
> > > please respond directly if you have feedback.
> > >
> > > (I will contact binding voters directly to confirm)
> > >
> > > Best,
> > > Jingsong
> > >
> > > On Tue, Nov 30, 2021 at 4:32 PM Timo Walther  wrote:
> > > >
> > > > Thanks for the healthy discussion. Also +1 from my side for this FLIP.
> > > >
> > > > Thanks,
> > > > Timo
> > > >
> > > > On 24.11.21 19:05, Stephan Ewen wrote:
> > > > > Thanks for all the details and explanation.
> > > > >
> > > > > With the conclusion of the discussion, also +1 from my side for this
> > > FLIP
> > > > >
> > > > > On Sat, Nov 13, 2021 at 12:23 PM Jingsong Li  > >
> > > wrote:
> > > > >
> > > > >> Thanks Stephan and Timo, I have a rough look at your replies. They
> > are
> > > > >> all valuable opinions. I will take time to discuss, explain and
> > > > >> improve them.
> > > > >>
> > > > >> Hi Timo,
> > > > >>> At least a final "I will start the vote soon. Last call for
> > > comments."
> > > > >> would have been nice.
> > > > >>
> > > > >> I replied in the DISCUSS thread that we began to vote. If there are
> > > > >> supplementary comments or reply "pause voting first, I will reply
> > > > >> later", we can suspend or cancel the voting at any time.
> > > > >> I understand why the FLIP must take three days to vote, so that more
> > > > >> people can see it and put forward their opinions.
> > > > >>
> > > > >> Best,
> > > > >> Jingsong
> > > > >>
> > > > >> On Sat, Nov 13, 2021 at 1:27 AM Timo Walther 
> > > wrote:
> > > > >>>
> > > > >>> Hi everyone,
> > > > >>>
> > > > >>> even though the DISCUSS thread was open for 2 weeks. I have the
> > > feeling
> > > > >>> that the VOTE was initiated to quickly. At least a final "I will
> > > start
> > > > >>> the vote soon. Last call for comments." would have been nice.
> > > > >>>
> > > > >>> I also added some comments in the DISCUSS thread. Let's hope we can
> > > > >>> resolve those soon.
> > > > >>>
> > > > >>> Regards,
> > > > >>> Timo
> > > > >>>
> > > > >>> On 12.11.21 16:36, Stephan Ewen wrote:
> > > >  Hi all!
> > > > 
> > > >  I have a few questions on the design still, posted those in the
> > > > >> [DISCUSS]
> > > >  thread.
> > > >  It would be great to clarify those first before concluding this
> > > vote.
> > > > 
> > > >  Thanks,
> > > >  Stephan
> > > > 
> > > > 
> > > >  On Fri, Nov 12, 2021 at 7:22 AM Jark Wu  wrote:
> > > > 
> > > > > +1 (binding)
> > > > >
> > > > > Thanks for the great work Jingsong!
> > > > >
> > > > > Best,
> > > > > Jark
> > > > >
> > > > > On Thu, 11 Nov 2021 at 19:41, JING ZHANG 
> > > > >> wrote:
> > > > >
> > > > >> +1 (non-binding)
> > > > >>
> > > > >> A small suggestion:
> > > > >> The message queue is currently used to store middle layer data
> > of
> > > the
> > > > >> streaming data warehouse. We hope use built-in dynamic table
> > > storage
> > > > >> to
> > > > >> store those middle layer.
> > > > >> But those middle data of the streaming data warehouse are often
> > > > >> provided
> > > > > to
> > > > >> all business teams in a company. Some teams have not use Apache
> > > > >> Flink as
> > > > >> compute engine yet. In order to continue server those teams, the
> > > > >> data in
> > > > >> built-in dynamic table storage may be needed to copied to
> > message
> > > > >> queue
> > > > >> again.
> > > > >> If *the built-in storage could provide same consumer API as the
> > > > >> commonly
> > > > >> used message queues*, data copying may be avoided. So the
> > built-in
> > > > > dynamic
> > > > >> table storage may be promoted faster in the streaming data
> > > warehouse
> > > > >> business.
> > > > >>
> > > > >> Best regards,
> > > > >> Jing Zhang
> > > > >>
> > > > >> Yufei Zhang  于2021年11月11日周四 上午9:34写道:
> > > > >>
> > > > >>> Hi,
> > > > >>>
> > > > >>> +1 (non-binding)
> > > > >>>
> > > > >>> Very interesting design. I saw a lot of discussion on the
> > generic
> > > > >>> interface design, good to know it will address extensibility.
> > > > >>>
> > > > >>> Cheers,
> > > > >>> Yufei
> > > > >>>
> > > > >>>
> > > > >>> On 2021/11/10 02:51:55 Jingsong Li wrote:
> > > >  Hi everyone,
> > > > 
> > > >  Thanks for all the feedback so far. Based on the discussion[1]
> > > we
> > > > > seem
> > > 

Re: [DISCUSS] FLIP-190: Support Version Upgrades for Table API & SQL Programs

2021-12-06 Thread godfrey he
Hi, Timo,

Thanks for the detailed explanation.

> We change an operator state of B in Flink 1.16. We perform the change in the 
> operator of B in a way to support both state layouts. Thus, no need for a new 
> ExecNode version.

I think this design makes thing more complex.
1. If there are multiple state layouts, which layout the ExecNode should use ?
It increases the cost of understanding for developers (especially for
Flink newer),
making them prone to mistakes.
2. `supportedPlanChanges ` and `supportedSavepointChanges ` are a bit obscure.

The purpose of ExecNode annotations are not only to support powerful validation,
but more importantly to make it easy for developers to understand
to ensure that every modification is easy and state compatible.

I prefer, once the state layout is changed, the ExecNode version needs
also be updated.
which could make thing simple. How about
rename `supportedPlanChanges ` to `planCompatibleVersion`
(which means the plan is compatible with the plan generated by the
given version node)
 and rename `supportedSavepointChanges` to `savepointCompatibleVersion `
(which means the state is compatible with the state generated by the
given version node) ?
The names also indicate that only one version value can be set.

WDYT?

Best,
Godfrey









Timo Walther  于2021年12月2日周四 下午11:42写道:
>
> Response to Marios's feedback:
>
>  > there should be some good logging in place when the upgrade is taking
> place
>
> Yes, I agree. I added this part to the FLIP.
>
>  > config option instead that doesn't provide the flexibility to
> overwrite certain plans
>
> One can set the config option also around sections of the
> multi-statement SQL script.
>
> SET 'table.plan.force-recompile'='true';
>
> COMPILE ...
>
> SET 'table.plan.force-recompile'='false';
>
> But the question is why a user wants to run COMPILE multiple times. If
> it is during development, then running EXECUTE (or just the statement
> itself) without calling COMPILE should be sufficient. The file can also
> manually be deleted if necessary.
>
> What do you think?
>
> Regards,
> Timo
>
>
>
> On 02.12.21 16:09, Timo Walther wrote:
> > Hi Till,
> >
> > Yes, you might have to. But not a new plan from the SQL query but a
> > migration from the old plan to the new plan. This will not happen often.
> > But we need a way to evolve the format of the JSON plan itself.
> >
> > Maybe this confuses a bit, so let me clarify it again: Mostly ExecNode
> > versions and operator state layouts will evolve. Not the plan files,
> > those will be pretty stable. But also not infinitely.
> >
> > Regards,
> > Timo
> >
> >
> > On 02.12.21 16:01, Till Rohrmann wrote:
> >> Then for migrating from Flink 1.10 to 1.12, I might have to create a new
> >> plan using Flink 1.11 in order to migrate from Flink 1.11 to 1.12, right?
> >>
> >> Cheers,
> >> Till
> >>
> >> On Thu, Dec 2, 2021 at 3:39 PM Timo Walther  wrote:
> >>
> >>> Response to Till's feedback:
> >>>
> >>>   > compiled plan won't be changed after being written initially
> >>>
> >>> This is not entirely correct. We give guarantees for keeping the query
> >>> up and running. We reserve us the right to force plan migrations. In
> >>> this case, the plan might not be created from the SQL statement but from
> >>> the old plan. I have added an example in section 10.1.1. In general,
> >>> both persisted entities "plan" and "savepoint" can evolve independently
> >>> from each other.
> >>>
> >>> Thanks,
> >>> Timo
> >>>
> >>> On 02.12.21 15:10, Timo Walther wrote:
>  Response to Godfrey's feedback:
> 
>    > "EXPLAIN PLAN EXECUTE STATEMENT SET BEGIN ... END" is missing.
> 
>  Thanks for the hint. I added a dedicated section 7.1.3.
> 
> 
>    > it's hard to maintain the supported versions for
>  "supportedPlanChanges" and "supportedSavepointChanges"
> 
>  Actually, I think we are mostly on the same page.
> 
>  The annotation does not need to be updated for every Flink version. As
>  the name suggests it is about "Changes" (in other words:
>  incompatibilities) that require some kind of migration. Either plan
>  migration (= PlanChanges) or savepoint migration (=SavepointChanges,
>  using operator migration or savepoint migration).
> 
>  Let's assume we introduced two ExecNodes A and B in Flink 1.15.
> 
>  The annotations are:
> 
>  @ExecNodeMetadata(name=A, supportedPlanChanges=1.15,
>  supportedSavepointChanges=1.15)
> 
>  @ExecNodeMetadata(name=B, supportedPlanChanges=1.15,
>  supportedSavepointChanges=1.15)
> 
>  We change an operator state of B in Flink 1.16.
> 
>  We perform the change in the operator of B in a way to support both
>  state layouts. Thus, no need for a new ExecNode version.
> 
>  The annotations in 1.16 are:
> 
>  @ExecNodeMetadata(name=A, supportedPlanChanges=1.15,
>  supportedSavepointChanges=1.15)
> 
>  @ExecNodeMetadata(name=B, supp

Re: [DISCUSS] FLIP-190: Support Version Upgrades for Table API & SQL Programs

2021-12-08 Thread godfrey he
more complex. But operator migration is
> >>>>> way easier than ExecNode migration at a later point in time for code
> >>>>> maintenance. We know that ExecNodes can become pretty complex. Even
> >>>>> though we have put a lot of code into `CommonXXExecNode` it will be a
> >>>>> lot of work to maintain multiple versions of ExecNodes. If we can avoid
> >>>>> this with operator state migration, this should always be preferred
> >> over
> >>>>> a new ExecNode version.
> >>>>>
> >>>>> I'm aware that operator state migration might only be important for
> >>>>> roughly 10 % of all changes. A new ExecNode version will be used for
> >> 90%
> >>>>> of all changes.
> >>>>>
> >>>>>> If there are multiple state layouts, which layout the ExecNode
> >> should
> >>>>> use?
> >>>>>
> >>>>> It is not the responsibility of the ExecNode to decide this but the
> >>>>> operator. Something like:
> >>>>>
> >>>>> class X extends ProcessFunction {
> >>>>>  ValueState oldStateLayout;
> >>>>>  ValueState newStateLayout;
> >>>>>
> >>>>>  open() {
> >>>>>if (oldStateLayout.get() != null) {
> >>>>>  performOperatorMigration();
> >>>>>}
> >>>>>useNewStateLayout();
> >>>>>  }
> >>>>> }
> >>>>>
> >>>>> Operator migration is meant for smaller "more local" changes without
> >>>>> touching the ExecNode layer. The CEP library and DataStream API sources
> >>>>> are performing operator migration for years already.
> >>>>>
> >>>>>
> >>>>>> `supportedPlanChanges ` and `supportedSavepointChanges ` are a bit
> >>>>> obscure.
> >>>>>
> >>>>> Let me try to come up with more examples why I think both annotation
> >>>>> make sense and are esp. important *for test coverage*.
> >>>>>
> >>>>> supportedPlanChanges:
> >>>>>
> >>>>> Let's assume we have some JSON in Flink 1.15:
> >>>>>
> >>>>> {
> >>>>>  some-prop: 42
> >>>>> }
> >>>>>
> >>>>> And we want to extend the JSON in Flink 1.16:
> >>>>>
> >>>>> {
> >>>>>  some-prop: 42,
> >>>>>  some-flag: false
> >>>>> }
> >>>>>
> >>>>> Maybe we don't need to increase the ExecNode version but only ensure
> >>>>> that the flag is set to `false` by default for the older versions.
> >>>>>
> >>>>> We need a location to track changes and document the changelog. With
> >> the
> >>>>> help of the annotation supportedPlanChanges = [1.15, 1.16] we can
> >> verify
> >>>>> that we have tests for both JSON formats.
> >>>>>
> >>>>> And once we decide to drop the 1.15 format, we enforce plan migration
> >>>>> and fill-in the default value `false` into the old plans and bump their
> >>>>> JSON plan version to 1.16 or higher.
> >>>>>
> >>>>>
> >>>>>
> >>>>>> once the state layout is changed, the ExecNode version needs also
> >> be
> >>>>> updated
> >>>>>
> >>>>> This will still be the majority of cases. But if we can avoid this, we
> >>>>> should do it for not having too much duplicate code to maintain.
> >>>>>
> >>>>>
> >>>>>
> >>>>> Thanks,
> >>>>> Timo
> >>>>>
> >>>>>
> >>>>> On 06.12.21 09:58, godfrey he wrote:
> >>>>>> Hi, Timo,
> >>>>>>
> >>>>>> Thanks for the detailed explanation.
> >>>>>>
> >>>>>>> We change an operator state of B in Flink 1.16. We perform the change
> >>>>> in the operator of B in a way to support both state layouts. Thus, no
> >> need
> >>>>> for a new ExecNode versio

  1   2   3   4   5   >