Re: [DISCUSS] FLIP 295: Support persistence of Catalog configuration and asynchronous registration

2023-06-04 Thread Feng Jin
Hi Samrat,

Thanks for your advice.

> 1. The createCatalog method does not mention any exceptions being thrown.

CreateCatalog will throw CatalogException like registerCatalog.  As
CatalogException is a RuntimeException,
there is no explicit declaration of throwing Exceptions in CatalogManager
and TableEnvironment.
To avoid misunderstandings, I have added the "throw CatalogException" flag
to the createCatalog method definition of CatalogStore.

> 2. Could you please provide an exhaustive list of the supported kinds?

Sure,  the documentation now includes the configuration of the built-in
CatalogStore as well as how to configure a custom CatalogStore.


Best,
Feng


On Sun, Jun 4, 2023 at 4:23 AM Samrat Deb  wrote:

> Hi Feng,
>
> Thank you for providing the proposal. I believe this feature will be highly
> valuable.
>
> I have a couple of inquiries:
>
> 1. According to the documentation [1], the createCatalog method does not
> mention any exceptions being thrown. However, I would like to confirm if it
> is always true that there will be no failures in all scenarios. Please let
> me know if there is any additional information I may have missed.
>
> 2. Regarding the registration process using the `table.catalog-store.kind`
> configuration, could you please provide an exhaustive list of the supported
> kinds?
>It would be great to have a comprehensive understanding of the options
> available.
>
> Bests,
> Samrat
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-295%3A+Support+lazy+initialization+of+catalogs+and+persistence+of+catalog+configurations
>
> On Sat, Jun 3, 2023 at 5:54 PM Feng Jin  wrote:
>
> > Hi Hang and Jark
> >
> > Thank you very much for your reply.
> >
> > @Jark
> > > 1. Could you move the CatalogStore registration API to the "Public
> > Interface" section?
> >
> > Indeed, it is more reasonable.
> >
> > > 2. We should prefix "table." for the CatalogStore configuration.
> >
> > Sure,   the names that you provided are indeed more logical and easier to
> > understand.
> >
> > > 3. About the open/close method in CatalogStoreFactory.
> >
> > I also agree with the proposed requirement scenario and design. I have
> > already made modifications to the interface.
> >
> > @Hang
> >
> > > 1. The `CatalogStore` need the `open`/`close` methods to init or close
> > the resource.
> >
> > I have already added the missing method.
> >
> > > 2. The `getCatalog` is misspelled as `optionalDescriptor`.
> >
> > Sorry for the misspelling, I updated this part.
> >
> > > 3.  Then the `CatalogStoreFactory` may need the `open`/`close` methods
> > to  init or close its resource
> >
> > This is indeed a scenario that needs to be considered, and I have added
> > relevant methods accordingly.
> > Additionally, I have included some relevant description information in
> the
> > documentation to help others understand
> >  why the open/close methods need to be added.
> >
> >
> > Best,
> > Feng
> >
> > On Sat, Jun 3, 2023 at 12:35 PM Jark Wu  wrote:
> >
> > > Hi Jing,
> > >
> > > Thank you for the update.
> > >
> > > 1. Could you move the CatalogStore registration API to the "Public
> > > Interface" section?
> > > "Proposed Changes" is more like a place to describe the implementation
> > > details.
> > >
> > > 2. We should prefix "table." for the CatalogStore configuration.
> Besides,
> > > the config key
> > > name should be hierarchical[1]. Therefore, it may be better to use:
> > > "table.catalog-store.kind"
> > > "table.catalog-store.file.path"
> > >
> > > 3. I think Hang's suggestions make sense.
> > >
> > > Others look good to me.
> > >
> > > Best,
> > > Jark
> > >
> > > On Fri, 2 Jun 2023 at 17:28, Hang Ruan  wrote:
> > >
> > > > Hi, Feng.
> > > >
> > > > Thanks for the update.
> > > > The current design looks good to me. I have some minor comments.
> > > >
> > > > 1. The `CatalogStore` need the `open`/`close` methods to init or
> close
> > > the
> > > > resource. For example, when storing the information in MySQL, the
> store
> > > > needs to open and close the connections.
> > > >
> > > > 2. The `getCatalog` is misspelled as `optionalDescriptor`.
> > > >
> > > > 3. About the usage in the sql gateway.
> > > > Considering the usage in sql gateway, the sql gateway may create a
> > > > CatalogStore for each session.
> > > > If we are using the MySqlCatalogStore, there would be so many
> > > connections.
> > > > How can we reuse the connection among these sessions?
> > > > I think sql gateway need to maintain a connection pool in
> > > > the CatalogStoreFactory and each session get its own connection from
> > the
> > > > pool when it is created.
> > > > Then the `CatalogStoreFactory` may need the `open`/`close` methods to
> > > init
> > > > or close its resource.
> > > > Or is there a better way?
> > > >
> > > > Best,
> > > > Hang
> > > >
> > > > Feng Jin  于2023年6月2日周五 14:45写道:
> > > >
> > > > > Thanks Jingsong.
> > > > >
> > > > > > Just naming, maybe `createCatalog` in TableEnv
> > > > >
> > > > > +1 

[jira] [Created] (FLINK-32249) A Java string should be used instead of a Calcite NlsString to construct the table and column comment attributes of CatalogTable

2023-06-04 Thread lincoln lee (Jira)
lincoln lee created FLINK-32249:
---

 Summary: A Java string should be used instead of a Calcite 
NlsString to construct the table and column comment attributes of CatalogTable
 Key: FLINK-32249
 URL: https://issues.apache.org/jira/browse/FLINK-32249
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Affects Versions: 1.17.1
Reporter: lincoln lee
 Fix For: 1.18.0


when Flink interacts with CatalogTable, it directly passes the Calcite's 
NlsString comment as a string to the comment attribute of the schema and 
column. In theory, a Java string should be passed here, otherwise the 
CatalogTable implementers may encounter special character encoding issues, 
e.g., an issue in apache paimon: 
[https://github.com/apache/incubator-paimon/issues/1262]

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] FLIP-311: Support Call Stored Procedure

2023-06-04 Thread yuxia
Hi, Jane.
Thanks for you input. I think we can add the auxiliary command show procedures 
in this FLIP.
Following the syntax for show functions proposed in FLIP-297. 
The syntax will be
SHOW PROCEDURES [ ( FROM | IN ) [catalog_name.]database_name ] [ [NOT] (LIKE | 
ILIKE)  ].
I have updated to this FLIP.

The other auxiliary commands maybe not suitable currently or need a 
further/dedicated dicussion. Let's keep this FLIP focus.

[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-297%3A+Improve+Auxiliary+Sql+Statements

Best regards,
Yuxia

- 原始邮件 -
发件人: "Jane Chan" 
收件人: "dev" 
发送时间: 星期六, 2023年 6 月 03日 下午 7:04:39
主题: Re: [DISCUSS] FLIP-311: Support Call Stored Procedure

Hi Yuxia,

Thanks for bringing this to the discussion. The call procedure is a widely
used feature and will be very useful for users.

I just have one question regarding the usage. The FLIP mentioned that

Flink will allow connector developers to develop their own built-in stored
> procedures, and then enables users to call these predefiend stored
> procedures.
>
In this FLIP, we don't intend to allow users to customize their own stored
> procedure  for we don't want to expose too much to users too early.


If I understand correctly, we might need to provide some auxiliary commands
to inform users what built-in procedures are provided and how to use them.
For example, Snowflake provides commands like [1] [2], and MySQL provides
commands like [3] [4].

[1] SHOW PROCEDURES,
https://docs.snowflake.com/en/sql-reference/sql/show-procedures
[2] DESCRIBE PROCEDURE ,
https://docs.snowflake.com/en/sql-reference/sql/desc-procedure
[3] SHOW PROCEDURE CODE,
https://dev.mysql.com/doc/refman/5.7/en/show-procedure-code.html
[4] SHOW PROCEDURE STATUS,
https://dev.mysql.com/doc/refman/5.7/en/show-procedure-status.html

Best,
Jane

On Sat, Jun 3, 2023 at 3:20 PM Benchao Li  wrote:

> Thanks Yuxia for the explanation, it makes sense to me. It would be great
> if you also add this to the FLIP doc.
>
> yuxia  于2023年6月1日周四 17:11写道:
>
> > Hi, Benchao.
> > Thanks for your attention.
> >
> > Initially, I also want to pass `TableEnvironment` to procedure. But
> > according my investegation and offline discussion with Jingson, the real
> > important thing for procedure devs is the ability to build Flink
> > datastream. But we can't get the `StreamExecutionEnvironment` which is
> the
> > entrypoint to build datastream. That's to say we will lost the ability to
> > build a datastream if we just pass `TableEnvironment`.
> >
> > Of course, we can also pass `TableEnvironment` along with
> > `StreamExecutionEnvironment` to Procedure. But I'm intend to be cautious
> > about exposing too much too early to procedure devs. If someday we find
> we
> > will need `TableEnvironment` to custom a procedure, we can then add a
> > method like `getTableEnvironment()` in `ProcedureContext`.
> >
> > Best regards,
> > Yuxia
> >
> > - 原始邮件 -
> > 发件人: "Benchao Li" 
> > 收件人: "dev" 
> > 发送时间: 星期四, 2023年 6 月 01日 下午 12:58:08
> > 主题: Re: [DISCUSS] FLIP-311: Support Call Stored Procedure
> >
> > Thanks Yuxia for opening this discussion,
> >
> > The general idea looks good to me, I only have one question about the
> > `ProcedureContext#getExecutionEnvironment`. Why are you proposing to
> return
> > a `StreamExecutionEnvironment` instead of `TableEnvironment`, could you
> > elaborate a little more on this?
> >
> > Jingsong Li  于2023年5月30日周二 17:58写道:
> >
> > > Thanks for your explanation.
> > >
> > > We can support Iterable in future. Current design looks good to me.
> > >
> > > Best,
> > > Jingsong
> > >
> > > On Tue, May 30, 2023 at 4:56 PM yuxia 
> > wrote:
> > > >
> > > > Hi, Jingsong.
> > > > Thanks for your feedback.
> > > >
> > > > > Does this need to be a function call? Do you have some example?
> > > > I think it'll be useful to support function call when user call
> > > procedure.
> > > > The following example is from iceberg:[1]
> > > > CALL catalog_name.system.migrate('spark_catalog.db.sample',
> map('foo',
> > > 'bar'));
> > > >
> > > > It allows user to use `map('foo', 'bar')` to pass a map data to
> > > procedure.
> > > >
> > > > Another case that I can imagine may be rollback a table to the
> snapshot
> > > of one week ago.
> > > > Then, with function call, user may call `rollback(table_name, now() -
> > > INTERVAL '7' DAY)` to acheive such purpose.
> > > >
> > > > Although it can be function call, the eventual parameter got by the
> > > procedure will always be the literal evaluated.
> > > >
> > > >
> > > > > Procedure looks like a TableFunction, do you consider using
> Collector
> > > > something like TableFunction? (Supports large amount of data)
> > > >
> > > > Yes, I had considered it. But returns T[] is for simpility,
> > > >
> > > > First, regarding how to return the calling result of a procedure, it
> > > looks more intuitive to me to use the return result of the `call`
> method
> > > instead of by calling something like collector#collect.
> > > > I

Re: [DISCUSS] FLIP-315: Support Operator Fusion Codegen for Flink SQL

2023-06-04 Thread liu ron
Hi, Jark

Thanks for your feedback, according to my initial assessment, the work
effort is relatively large.

Moreover, I will add a test result of all queries to the FLIP.

Best,
Ron

Jark Wu  于2023年6月1日周四 20:45写道:

> Hi Ron,
>
> Thanks a lot for the great proposal. The FLIP looks good to me in general.
> It looks like not an easy work but the performance sounds promising. So I
> think it's worth doing.
>
> Besides, if there is a complete test graph with all TPC-DS queries, the
> effect of this FLIP will be more intuitive.
>
> Best,
> Jark
>
>
>
> On Wed, 31 May 2023 at 14:27, liu ron  wrote:
>
> > Hi, Jinsong
> >
> > Thanks for your valuable suggestions.
> >
> > Best,
> > Ron
> >
> > Jingsong Li  于2023年5月30日周二 13:22写道:
> >
> > > Thanks Ron for your information.
> > >
> > > I suggest that it can be written in the Motivation of FLIP.
> > >
> > > Best,
> > > Jingsong
> > >
> > > On Tue, May 30, 2023 at 9:57 AM liu ron  wrote:
> > > >
> > > > Hi, Jingsong
> > > >
> > > > Thanks for your review. We have tested it in TPC-DS case, and got a
> 12%
> > > > gain overall when only supporting only Calc&HashJoin&HashAgg
> operator.
> > In
> > > > some queries, we even get more than 30% gain, it looks like  an
> > effective
> > > > way.
> > > >
> > > > Best,
> > > > Ron
> > > >
> > > > Jingsong Li  于2023年5月29日周一 14:33写道:
> > > >
> > > > > Thanks Ron for the proposal.
> > > > >
> > > > > Do you have some benchmark results for the performance
> improvement? I
> > > > > am more concerned about the improvement on Flink than the data in
> > > > > other papers.
> > > > >
> > > > > Best,
> > > > > Jingsong
> > > > >
> > > > > On Mon, May 29, 2023 at 2:16 PM liu ron 
> wrote:
> > > > > >
> > > > > > Hi, dev
> > > > > >
> > > > > > I'd like to start a discussion about FLIP-315: Support Operator
> > > Fusion
> > > > > > Codegen for Flink SQL[1]
> > > > > >
> > > > > > As main memory grows, query performance is more and more
> determined
> > > by
> > > > > the
> > > > > > raw CPU costs of query processing itself, this is due to the
> query
> > > > > > processing techniques based on interpreted execution shows poor
> > > > > performance
> > > > > > on modern CPUs due to lack of locality and frequent instruction
> > > > > > mis-prediction. Therefore, the industry is also researching how
> to
> > > > > improve
> > > > > > engine performance by increasing operator execution efficiency.
> In
> > > > > > addition, during the process of optimizing Flink's performance
> for
> > > TPC-DS
> > > > > > queries, we found that a significant amount of CPU time was spent
> > on
> > > > > > virtual function calls, framework collector calls, and invalid
> > > > > > calculations, which can be optimized to improve the overall
> engine
> > > > > > performance. After some investigation, we found Operator Fusion
> > > Codegen
> > > > > > which is proposed by Thomas Neumann in the paper[2] can address
> > these
> > > > > > problems. I have finished a PoC[3] to verify its feasibility and
> > > > > validity.
> > > > > >
> > > > > > Looking forward to your feedback.
> > > > > >
> > > > > > [1]:
> > > > > >
> > > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-315+Support+Operator+Fusion+Codegen+for+Flink+SQL
> > > > > > [2]: http://www.vldb.org/pvldb/vol4/p539-neumann.pdf
> > > > > > [3]: https://github.com/lsyldliu/flink/tree/OFCG
> > > > > >
> > > > > > Best,
> > > > > > Ron
> > > > >
> > >
> >
>


Re: [DISCUSS] FLIP-315: Support Operator Fusion Codegen for Flink SQL

2023-06-04 Thread Lincoln Lee
Hi Ron

OFGC looks like an exciting optimization, looking forward to its completion
in Flink!
A small question, do we consider adding a benchmark for the operators to
intuitively understand the improvement brought by each improvement?
In addition, for the implementation plan, mentioned in the FLIP that 1.18
will support Calc, HashJoin and HashAgg, then what will be the next step?
and which operators do we ultimately expect to cover (all or specific ones)?

Best,
Lincoln Lee


liu ron  于2023年6月5日周一 09:40写道:

> Hi, Jark
>
> Thanks for your feedback, according to my initial assessment, the work
> effort is relatively large.
>
> Moreover, I will add a test result of all queries to the FLIP.
>
> Best,
> Ron
>
> Jark Wu  于2023年6月1日周四 20:45写道:
>
> > Hi Ron,
> >
> > Thanks a lot for the great proposal. The FLIP looks good to me in
> general.
> > It looks like not an easy work but the performance sounds promising. So I
> > think it's worth doing.
> >
> > Besides, if there is a complete test graph with all TPC-DS queries, the
> > effect of this FLIP will be more intuitive.
> >
> > Best,
> > Jark
> >
> >
> >
> > On Wed, 31 May 2023 at 14:27, liu ron  wrote:
> >
> > > Hi, Jinsong
> > >
> > > Thanks for your valuable suggestions.
> > >
> > > Best,
> > > Ron
> > >
> > > Jingsong Li  于2023年5月30日周二 13:22写道:
> > >
> > > > Thanks Ron for your information.
> > > >
> > > > I suggest that it can be written in the Motivation of FLIP.
> > > >
> > > > Best,
> > > > Jingsong
> > > >
> > > > On Tue, May 30, 2023 at 9:57 AM liu ron  wrote:
> > > > >
> > > > > Hi, Jingsong
> > > > >
> > > > > Thanks for your review. We have tested it in TPC-DS case, and got a
> > 12%
> > > > > gain overall when only supporting only Calc&HashJoin&HashAgg
> > operator.
> > > In
> > > > > some queries, we even get more than 30% gain, it looks like  an
> > > effective
> > > > > way.
> > > > >
> > > > > Best,
> > > > > Ron
> > > > >
> > > > > Jingsong Li  于2023年5月29日周一 14:33写道:
> > > > >
> > > > > > Thanks Ron for the proposal.
> > > > > >
> > > > > > Do you have some benchmark results for the performance
> > improvement? I
> > > > > > am more concerned about the improvement on Flink than the data in
> > > > > > other papers.
> > > > > >
> > > > > > Best,
> > > > > > Jingsong
> > > > > >
> > > > > > On Mon, May 29, 2023 at 2:16 PM liu ron 
> > wrote:
> > > > > > >
> > > > > > > Hi, dev
> > > > > > >
> > > > > > > I'd like to start a discussion about FLIP-315: Support Operator
> > > > Fusion
> > > > > > > Codegen for Flink SQL[1]
> > > > > > >
> > > > > > > As main memory grows, query performance is more and more
> > determined
> > > > by
> > > > > > the
> > > > > > > raw CPU costs of query processing itself, this is due to the
> > query
> > > > > > > processing techniques based on interpreted execution shows poor
> > > > > > performance
> > > > > > > on modern CPUs due to lack of locality and frequent instruction
> > > > > > > mis-prediction. Therefore, the industry is also researching how
> > to
> > > > > > improve
> > > > > > > engine performance by increasing operator execution efficiency.
> > In
> > > > > > > addition, during the process of optimizing Flink's performance
> > for
> > > > TPC-DS
> > > > > > > queries, we found that a significant amount of CPU time was
> spent
> > > on
> > > > > > > virtual function calls, framework collector calls, and invalid
> > > > > > > calculations, which can be optimized to improve the overall
> > engine
> > > > > > > performance. After some investigation, we found Operator Fusion
> > > > Codegen
> > > > > > > which is proposed by Thomas Neumann in the paper[2] can address
> > > these
> > > > > > > problems. I have finished a PoC[3] to verify its feasibility
> and
> > > > > > validity.
> > > > > > >
> > > > > > > Looking forward to your feedback.
> > > > > > >
> > > > > > > [1]:
> > > > > > >
> > > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-315+Support+Operator+Fusion+Codegen+for+Flink+SQL
> > > > > > > [2]: http://www.vldb.org/pvldb/vol4/p539-neumann.pdf
> > > > > > > [3]: https://github.com/lsyldliu/flink/tree/OFCG
> > > > > > >
> > > > > > > Best,
> > > > > > > Ron
> > > > > >
> > > >
> > >
> >
>


Re: [DISCUSS] FLIP-313 Add support of User Defined AsyncTableFunction

2023-06-04 Thread Aitozi
Hi Jing,

> what is the difference between the RPC call or query you mentioned
and the lookup in a very
general way

I think the RPC call or query service is quite similar to the lookup join.
But lookup join should work
with `LookupTableSource`.

Let's see how we can perform an async RPC call with lookup join:

(1) Implement an AsyncTableFunction with RPC call logic.
(2) Implement a `LookupTableSource` connector run with the async udtf
defined in (1).
(3) Then define a DDL of this look up table in SQL

CREATE TEMPORARY TABLE Customers (
  id INT,
  name STRING,
  country STRING,
  zip STRING
) WITH (
  'connector' = 'custom'
);

(4) Run with the query as below:

SELECT o.order_id, o.total, c.country, c.zip
FROM Orders AS o
  JOIN Customers FOR SYSTEM_TIME AS OF o.proc_time AS c
ON o.customer_id = c.id;

This example is from doc
.You
can image the look up process as an async RPC call process.

Let's see how we can perform an async RPC call with lateral join:

(1) Implement an AsyncTableFunction with RPC call logic.
(2) Run query with

Create function f1 as '...' ;

SELECT o.order_id, o.total, c.country, c.zip FROM Orders  lateral table
(f1(order_id)) as T(...);

As you can see, the lateral join version is more simple and intuitive to
users. Users do not have to wrap a
LookupTableSource for the purpose of using async udtf.

In the end, We can also see the user defined async table function is an
enhancement of the current lateral table join
which only supports sync lateral join now.

Best,
Aitozi.


Jing Ge  于2023年6月2日周五 19:37写道:

> Hi Aitozi,
>
> Thanks for the update. Just out of curiosity, what is the difference
> between the RPC call or query you mentioned and the lookup in a very
> general way? Since Lateral join is used in the FLIP. Is there any special
> thought for that? Sorry for asking so many questions. The FLIP contains
> limited information to understand the motivation.
>
> Best regards,
> Jing
>
> On Fri, Jun 2, 2023 at 3:48 AM Aitozi  wrote:
>
> > Hi Jing,
> > I have updated the proposed changes to the FLIP. IMO, lookup has its
> > clear
> > async call requirement is due to its IO heavy operator. In our usage, sql
> > users have
> > logic to do some RPC call or query the third-party service which is also
> IO
> > intensive.
> > In these case, we'd like to leverage the async function to improve the
> > throughput.
> >
> > Thanks,
> > Aitozi.
> >
> > Jing Ge  于2023年6月1日周四 22:55写道:
> >
> > > Hi Aitozi,
> > >
> > > Sorry for the late reply. Would you like to update the proposed changes
> > > with more details into the FLIP too?
> > > I got your point. It looks like a rational idea. However, since lookup
> > has
> > > its clear async call requirement, are there any real use cases that
> > > need this change? This will help us understand the motivation. After
> all,
> > > lateral join and temporal lookup join[1] are quite different.
> > >
> > > Best regards,
> > > Jing
> > >
> > >
> > > [1]
> > >
> > >
> >
> https://github.com/apache/flink/blob/d90a72da2fd601ca4e2a46700e91ec5b348de2ad/flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/AsyncTableFunction.java#L54
> > >
> > > On Wed, May 31, 2023 at 8:53 AM Aitozi  wrote:
> > >
> > > > Hi Jing,
> > > > What do you think about it? Can we move forward this feature?
> > > >
> > > > Thanks,
> > > > Aitozi.
> > > >
> > > > Aitozi  于2023年5月29日周一 09:56写道:
> > > >
> > > > > Hi Jing,
> > > > > > "Do you mean to support the AyncTableFunction beyond the
> > > > > LookupTableSource?"
> > > > > Yes, I mean to support the AyncTableFunction beyond the
> > > > LookupTableSource.
> > > > >
> > > > > The "AsyncTableFunction" is the function with ability to be
> executed
> > > > async
> > > > > (with AsyncWaitOperator).
> > > > > The async lookup join is a one of usage of this. So, we don't have
> to
> > > > bind
> > > > > the AyncTableFunction with LookupTableSource.
> > > > > If User-defined AsyncTableFunction is supported, user can directly
> > use
> > > > > lateral table syntax to perform async operation.
> > > > >
> > > > > > "It would be better if you could elaborate the proposed changes
> wrt
> > > the
> > > > > CorrelatedCodeGenerator with more details"
> > > > >
> > > > > In the proposal, we use lateral table syntax to support the async
> > table
> > > > > function. So the planner will also treat this statement to a
> > > > > CommonExecCorrelate node. So the runtime code should be generated
> in
> > > > > CorrelatedCodeGenerator.
> > > > > In CorrelatedCodeGenerator, we will know the TableFunction's Kind
> of
> > > > > `FunctionKind.Table` or `FunctionKind.ASYNC_TABLE`
> > > > > For  `FunctionKind.ASYNC_TABLE` we can generate a AsyncWaitOperator
> > to
> > > > > execute the async table function.
> > > > >
> > > > >
> > > > > Thanks,
> > > > > Aitozi.
> > > > >
> > > > >
> > > > > Jing Ge  于2023年5月29日周一 03:22写道:

[jira] [Created] (FLINK-32250) Revert the Field name of BUFFER_TIMEOUT to improve compatibility

2023-06-04 Thread Rui Fan (Jira)
Rui Fan created FLINK-32250:
---

 Summary: Revert the Field name of BUFFER_TIMEOUT to improve 
compatibility
 Key: FLINK-32250
 URL: https://issues.apache.org/jira/browse/FLINK-32250
 Project: Flink
  Issue Type: Bug
  Components: API / DataStream
Affects Versions: 1.18.0
Reporter: Rui Fan
Assignee: Rui Fan
 Fix For: 1.18.0


FLINK-32023 changed the `ExecutionOptions.BUFFER_TIMEOUT` to 
`ExecutionOptions.BUFFER_TIMEOUT_INTERVAL`, the filed name should be reverted.

Because the `ExecutionOptions` is a public evolving API, some flink users are 
using the `ExecutionOptions.BUFFER_TIMEOUT` in their code. If we update it, the 
code cannot upgrade to 1.18 directly.

 

BTW, the option name is changed from `execution.buffer-timeout` to 
`execution.buffer-timeout.interval`. However, we marked the 
`execution.buffer-timeout` as `DeprecatedKeys`. So 1.18 is compatible with the 
old option name.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] FLIP-315: Support Operator Fusion Codegen for Flink SQL

2023-06-04 Thread Yun Tang
Hi Ron,

I think this FLIP would help to improve the performance, looking forward to its 
completion in Flink!

For the state compatibility session, it seems that the checkpoint compatibility 
would be broken just like [1] did. Could FLIP-190 [2] still be helpful in this 
case for SQL version upgrades?


[1] 
https://docs.google.com/document/d/1qKVohV12qn-bM51cBZ8Hcgp31ntwClxjoiNBUOqVHsI/edit#heading=h.fri5rtpte0si
[2] https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=191336489

Best
Yun Tang


From: Lincoln Lee 
Sent: Monday, June 5, 2023 10:56
To: dev@flink.apache.org 
Subject: Re: [DISCUSS] FLIP-315: Support Operator Fusion Codegen for Flink SQL

Hi Ron

OFGC looks like an exciting optimization, looking forward to its completion
in Flink!
A small question, do we consider adding a benchmark for the operators to
intuitively understand the improvement brought by each improvement?
In addition, for the implementation plan, mentioned in the FLIP that 1.18
will support Calc, HashJoin and HashAgg, then what will be the next step?
and which operators do we ultimately expect to cover (all or specific ones)?

Best,
Lincoln Lee


liu ron  于2023年6月5日周一 09:40写道:

> Hi, Jark
>
> Thanks for your feedback, according to my initial assessment, the work
> effort is relatively large.
>
> Moreover, I will add a test result of all queries to the FLIP.
>
> Best,
> Ron
>
> Jark Wu  于2023年6月1日周四 20:45写道:
>
> > Hi Ron,
> >
> > Thanks a lot for the great proposal. The FLIP looks good to me in
> general.
> > It looks like not an easy work but the performance sounds promising. So I
> > think it's worth doing.
> >
> > Besides, if there is a complete test graph with all TPC-DS queries, the
> > effect of this FLIP will be more intuitive.
> >
> > Best,
> > Jark
> >
> >
> >
> > On Wed, 31 May 2023 at 14:27, liu ron  wrote:
> >
> > > Hi, Jinsong
> > >
> > > Thanks for your valuable suggestions.
> > >
> > > Best,
> > > Ron
> > >
> > > Jingsong Li  于2023年5月30日周二 13:22写道:
> > >
> > > > Thanks Ron for your information.
> > > >
> > > > I suggest that it can be written in the Motivation of FLIP.
> > > >
> > > > Best,
> > > > Jingsong
> > > >
> > > > On Tue, May 30, 2023 at 9:57 AM liu ron  wrote:
> > > > >
> > > > > Hi, Jingsong
> > > > >
> > > > > Thanks for your review. We have tested it in TPC-DS case, and got a
> > 12%
> > > > > gain overall when only supporting only Calc&HashJoin&HashAgg
> > operator.
> > > In
> > > > > some queries, we even get more than 30% gain, it looks like  an
> > > effective
> > > > > way.
> > > > >
> > > > > Best,
> > > > > Ron
> > > > >
> > > > > Jingsong Li  于2023年5月29日周一 14:33写道:
> > > > >
> > > > > > Thanks Ron for the proposal.
> > > > > >
> > > > > > Do you have some benchmark results for the performance
> > improvement? I
> > > > > > am more concerned about the improvement on Flink than the data in
> > > > > > other papers.
> > > > > >
> > > > > > Best,
> > > > > > Jingsong
> > > > > >
> > > > > > On Mon, May 29, 2023 at 2:16 PM liu ron 
> > wrote:
> > > > > > >
> > > > > > > Hi, dev
> > > > > > >
> > > > > > > I'd like to start a discussion about FLIP-315: Support Operator
> > > > Fusion
> > > > > > > Codegen for Flink SQL[1]
> > > > > > >
> > > > > > > As main memory grows, query performance is more and more
> > determined
> > > > by
> > > > > > the
> > > > > > > raw CPU costs of query processing itself, this is due to the
> > query
> > > > > > > processing techniques based on interpreted execution shows poor
> > > > > > performance
> > > > > > > on modern CPUs due to lack of locality and frequent instruction
> > > > > > > mis-prediction. Therefore, the industry is also researching how
> > to
> > > > > > improve
> > > > > > > engine performance by increasing operator execution efficiency.
> > In
> > > > > > > addition, during the process of optimizing Flink's performance
> > for
> > > > TPC-DS
> > > > > > > queries, we found that a significant amount of CPU time was
> spent
> > > on
> > > > > > > virtual function calls, framework collector calls, and invalid
> > > > > > > calculations, which can be optimized to improve the overall
> > engine
> > > > > > > performance. After some investigation, we found Operator Fusion
> > > > Codegen
> > > > > > > which is proposed by Thomas Neumann in the paper[2] can address
> > > these
> > > > > > > problems. I have finished a PoC[3] to verify its feasibility
> and
> > > > > > validity.
> > > > > > >
> > > > > > > Looking forward to your feedback.
> > > > > > >
> > > > > > > [1]:
> > > > > > >
> > > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-315+Support+Operator+Fusion+Codegen+for+Flink+SQL
> > > > > > > [2]: http://www.vldb.org/pvldb/vol4/p539-neumann.pdf
> > > > > > > [3]: https://github.com/lsyldliu/flink/tree/OFCG
> > > > > > >
> > > > > > > Best,
> > > > > > > Ron
> > > > > >
> > > >
> > >
> >
>


Re: [DISCUSS] FLIP-315: Support Operator Fusion Codegen for Flink SQL

2023-06-04 Thread Jingsong Li
> For the state compatibility session, it seems that the checkpoint 
> compatibility would be broken just like [1] did. Could FLIP-190 [2] still be 
> helpful in this case for SQL version upgrades?

I guess this is only for batch processing. Streaming should be another story?

Best,
Jingsong

On Mon, Jun 5, 2023 at 2:07 PM Yun Tang  wrote:
>
> Hi Ron,
>
> I think this FLIP would help to improve the performance, looking forward to 
> its completion in Flink!
>
> For the state compatibility session, it seems that the checkpoint 
> compatibility would be broken just like [1] did. Could FLIP-190 [2] still be 
> helpful in this case for SQL version upgrades?
>
>
> [1] 
> https://docs.google.com/document/d/1qKVohV12qn-bM51cBZ8Hcgp31ntwClxjoiNBUOqVHsI/edit#heading=h.fri5rtpte0si
> [2] https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=191336489
>
> Best
> Yun Tang
>
> 
> From: Lincoln Lee 
> Sent: Monday, June 5, 2023 10:56
> To: dev@flink.apache.org 
> Subject: Re: [DISCUSS] FLIP-315: Support Operator Fusion Codegen for Flink SQL
>
> Hi Ron
>
> OFGC looks like an exciting optimization, looking forward to its completion
> in Flink!
> A small question, do we consider adding a benchmark for the operators to
> intuitively understand the improvement brought by each improvement?
> In addition, for the implementation plan, mentioned in the FLIP that 1.18
> will support Calc, HashJoin and HashAgg, then what will be the next step?
> and which operators do we ultimately expect to cover (all or specific ones)?
>
> Best,
> Lincoln Lee
>
>
> liu ron  于2023年6月5日周一 09:40写道:
>
> > Hi, Jark
> >
> > Thanks for your feedback, according to my initial assessment, the work
> > effort is relatively large.
> >
> > Moreover, I will add a test result of all queries to the FLIP.
> >
> > Best,
> > Ron
> >
> > Jark Wu  于2023年6月1日周四 20:45写道:
> >
> > > Hi Ron,
> > >
> > > Thanks a lot for the great proposal. The FLIP looks good to me in
> > general.
> > > It looks like not an easy work but the performance sounds promising. So I
> > > think it's worth doing.
> > >
> > > Besides, if there is a complete test graph with all TPC-DS queries, the
> > > effect of this FLIP will be more intuitive.
> > >
> > > Best,
> > > Jark
> > >
> > >
> > >
> > > On Wed, 31 May 2023 at 14:27, liu ron  wrote:
> > >
> > > > Hi, Jinsong
> > > >
> > > > Thanks for your valuable suggestions.
> > > >
> > > > Best,
> > > > Ron
> > > >
> > > > Jingsong Li  于2023年5月30日周二 13:22写道:
> > > >
> > > > > Thanks Ron for your information.
> > > > >
> > > > > I suggest that it can be written in the Motivation of FLIP.
> > > > >
> > > > > Best,
> > > > > Jingsong
> > > > >
> > > > > On Tue, May 30, 2023 at 9:57 AM liu ron  wrote:
> > > > > >
> > > > > > Hi, Jingsong
> > > > > >
> > > > > > Thanks for your review. We have tested it in TPC-DS case, and got a
> > > 12%
> > > > > > gain overall when only supporting only Calc&HashJoin&HashAgg
> > > operator.
> > > > In
> > > > > > some queries, we even get more than 30% gain, it looks like  an
> > > > effective
> > > > > > way.
> > > > > >
> > > > > > Best,
> > > > > > Ron
> > > > > >
> > > > > > Jingsong Li  于2023年5月29日周一 14:33写道:
> > > > > >
> > > > > > > Thanks Ron for the proposal.
> > > > > > >
> > > > > > > Do you have some benchmark results for the performance
> > > improvement? I
> > > > > > > am more concerned about the improvement on Flink than the data in
> > > > > > > other papers.
> > > > > > >
> > > > > > > Best,
> > > > > > > Jingsong
> > > > > > >
> > > > > > > On Mon, May 29, 2023 at 2:16 PM liu ron 
> > > wrote:
> > > > > > > >
> > > > > > > > Hi, dev
> > > > > > > >
> > > > > > > > I'd like to start a discussion about FLIP-315: Support Operator
> > > > > Fusion
> > > > > > > > Codegen for Flink SQL[1]
> > > > > > > >
> > > > > > > > As main memory grows, query performance is more and more
> > > determined
> > > > > by
> > > > > > > the
> > > > > > > > raw CPU costs of query processing itself, this is due to the
> > > query
> > > > > > > > processing techniques based on interpreted execution shows poor
> > > > > > > performance
> > > > > > > > on modern CPUs due to lack of locality and frequent instruction
> > > > > > > > mis-prediction. Therefore, the industry is also researching how
> > > to
> > > > > > > improve
> > > > > > > > engine performance by increasing operator execution efficiency.
> > > In
> > > > > > > > addition, during the process of optimizing Flink's performance
> > > for
> > > > > TPC-DS
> > > > > > > > queries, we found that a significant amount of CPU time was
> > spent
> > > > on
> > > > > > > > virtual function calls, framework collector calls, and invalid
> > > > > > > > calculations, which can be optimized to improve the overall
> > > engine
> > > > > > > > performance. After some investigation, we found Operator Fusion
> > > > > Codegen
> > > > > > > > which is proposed by Thomas Neumann in the paper[2] can address
> > > > these
> > > > > > > > proble

[jira] [Created] (FLINK-32251) flink jdbc catalog cannot add parameters to url

2023-06-04 Thread melin (Jira)
melin created FLINK-32251:
-

 Summary: flink jdbc catalog cannot add parameters to url
 Key: FLINK-32251
 URL: https://issues.apache.org/jira/browse/FLINK-32251
 Project: Flink
  Issue Type: New Feature
Reporter: melin


For example, add an encoding parameter to the mysql url:

jdbc:mysql://host:port/database?useUnicode=true&characterEncoding=UTF-8



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] FLIP-315: Support Operator Fusion Codegen for Flink SQL

2023-06-04 Thread Benchao Li
Thanks Ron for initiating this discussion, the OFGC sounds a cool
optimization.

I also agree with above comments about the benchmark result, it is
important for performance improvements.

And I thing Yun and Jingsong has raised a very interesting topic about the
support for streaming. It's worth mentioning that FLINK-19621[1] also aims
to support both batch and streaming in the Jira description and design
doc[2], but it closes with only supporting batch now.

I agree that those optimizations has higher priority for batch since it has
standard benchmarks such as TPCDS/TPCH, the performance improvement is much
more easier to show it's value. For the streaming part, I think it would
also be great to have if the optimization fits streaming as well because
Flink is a unified streaming and batch engine. Hence, we'd better to
clearly set the goal such as supporting batch only, or both streaming and
batch, and write it down clearly in the FLIP.

[1] https://issues.apache.org/jira/browse/FLINK-19621
[2]
https://docs.google.com/document/d/1qKVohV12qn-bM51cBZ8Hcgp31ntwClxjoiNBUOqVHsI/edit#


Jingsong Li  于2023年6月5日周一 14:15写道:

> > For the state compatibility session, it seems that the checkpoint
> compatibility would be broken just like [1] did. Could FLIP-190 [2] still
> be helpful in this case for SQL version upgrades?
>
> I guess this is only for batch processing. Streaming should be another
> story?
>
> Best,
> Jingsong
>
> On Mon, Jun 5, 2023 at 2:07 PM Yun Tang  wrote:
> >
> > Hi Ron,
> >
> > I think this FLIP would help to improve the performance, looking forward
> to its completion in Flink!
> >
> > For the state compatibility session, it seems that the checkpoint
> compatibility would be broken just like [1] did. Could FLIP-190 [2] still
> be helpful in this case for SQL version upgrades?
> >
> >
> > [1]
> https://docs.google.com/document/d/1qKVohV12qn-bM51cBZ8Hcgp31ntwClxjoiNBUOqVHsI/edit#heading=h.fri5rtpte0si
> > [2]
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=191336489
> >
> > Best
> > Yun Tang
> >
> > 
> > From: Lincoln Lee 
> > Sent: Monday, June 5, 2023 10:56
> > To: dev@flink.apache.org 
> > Subject: Re: [DISCUSS] FLIP-315: Support Operator Fusion Codegen for
> Flink SQL
> >
> > Hi Ron
> >
> > OFGC looks like an exciting optimization, looking forward to its
> completion
> > in Flink!
> > A small question, do we consider adding a benchmark for the operators to
> > intuitively understand the improvement brought by each improvement?
> > In addition, for the implementation plan, mentioned in the FLIP that 1.18
> > will support Calc, HashJoin and HashAgg, then what will be the next step?
> > and which operators do we ultimately expect to cover (all or specific
> ones)?
> >
> > Best,
> > Lincoln Lee
> >
> >
> > liu ron  于2023年6月5日周一 09:40写道:
> >
> > > Hi, Jark
> > >
> > > Thanks for your feedback, according to my initial assessment, the work
> > > effort is relatively large.
> > >
> > > Moreover, I will add a test result of all queries to the FLIP.
> > >
> > > Best,
> > > Ron
> > >
> > > Jark Wu  于2023年6月1日周四 20:45写道:
> > >
> > > > Hi Ron,
> > > >
> > > > Thanks a lot for the great proposal. The FLIP looks good to me in
> > > general.
> > > > It looks like not an easy work but the performance sounds promising.
> So I
> > > > think it's worth doing.
> > > >
> > > > Besides, if there is a complete test graph with all TPC-DS queries,
> the
> > > > effect of this FLIP will be more intuitive.
> > > >
> > > > Best,
> > > > Jark
> > > >
> > > >
> > > >
> > > > On Wed, 31 May 2023 at 14:27, liu ron  wrote:
> > > >
> > > > > Hi, Jinsong
> > > > >
> > > > > Thanks for your valuable suggestions.
> > > > >
> > > > > Best,
> > > > > Ron
> > > > >
> > > > > Jingsong Li  于2023年5月30日周二 13:22写道:
> > > > >
> > > > > > Thanks Ron for your information.
> > > > > >
> > > > > > I suggest that it can be written in the Motivation of FLIP.
> > > > > >
> > > > > > Best,
> > > > > > Jingsong
> > > > > >
> > > > > > On Tue, May 30, 2023 at 9:57 AM liu ron 
> wrote:
> > > > > > >
> > > > > > > Hi, Jingsong
> > > > > > >
> > > > > > > Thanks for your review. We have tested it in TPC-DS case, and
> got a
> > > > 12%
> > > > > > > gain overall when only supporting only Calc&HashJoin&HashAgg
> > > > operator.
> > > > > In
> > > > > > > some queries, we even get more than 30% gain, it looks like  an
> > > > > effective
> > > > > > > way.
> > > > > > >
> > > > > > > Best,
> > > > > > > Ron
> > > > > > >
> > > > > > > Jingsong Li  于2023年5月29日周一 14:33写道:
> > > > > > >
> > > > > > > > Thanks Ron for the proposal.
> > > > > > > >
> > > > > > > > Do you have some benchmark results for the performance
> > > > improvement? I
> > > > > > > > am more concerned about the improvement on Flink than the
> data in
> > > > > > > > other papers.
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Jingsong
> > > > > > > >
> > > > > > > > On Mon, May 29, 2023 at 2:16 PM liu ron 
> > > > wrote:
> > > >