Re: FLINK-20767 - Support for nested fields filter push down

2023-08-06 Thread yh z
Hi, Venkatakrishnan,
I think this is a very useful feature. I have been focusing on the
development of the flink-table-planner module recently, so if you need some
help, I can assist you in completing the development of some sub-tasks or
code review.

Returning to the design itself, I think it's necessary to modify
FieldReferenceExpression or re-implement a NestedFieldReferenceExpression.
As for modifying the interface of SupportsProjectionPushDown, I think we
need to make some trade-offs. As a connector developer, the stability of
the interface is very important. If there are no unresolved bugs, I
personally do not recommend modifying the interface. However, when I first
read the code of SupportsProjectionPushDown, the design of int[][] was very
confusing for me, and it took me a long time to understand it by running
specify UT tests. Therefore, in terms of the design of this interface and
the consistency between different interfaces, there is indeed room for
improvement it.

Thanks,
Yunhong Zheng (Swuferhong)




Becket Qin  于2023年8月3日周四 07:44写道:

> Hi Jark,
>
> If the FieldReferenceExpression contains an int[] to support a nested field
> reference, List (or FieldReferenceExpression[])
> and int[][] are actually equivalent. If we are designing this from scratch,
> personally I prefer using List for consistency,
> i.e. always resolving everything to expressions for users. Projection is a
> simpler case, but should not be a special case. This avoids doing the same
> thing in different ways which is also a confusion to the users. To me, the
> int[][] format would become kind of a technical debt after we extend the
> FieldReferenceExpression. Although we don't have to address it right away
> in the same FLIP, this kind of debt accumulates over time and makes the
> project harder to learn and maintain. So, personally I prefer to address
> these technical debts as soon as possible.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Wed, Aug 2, 2023 at 8:19 PM Jark Wu  wrote:
>
> > Hi,
> >
> > I agree with Becket that we may need to extend FieldReferenceExpression
> to
> > support nested field access (or maybe a new
> > NestedFieldReferenceExpression).
> > But I have some concerns about evolving the
> > SupportsProjectionPushDown.applyProjection.
> > A projection is much simpler than Filter Expression which only needs to
> > represent the field indexes.
> > If we evolve `applyProjection` to accept `List
> > projectedFields`,
> > users have to convert the `List` back to
> int[][]
> > which is an overhead for users.
> > Field indexes (int[][]) is required to project schemas with the
> > utility org.apache.flink.table.connector.Projection.
> >
> >
> > Best,
> > Jark
> >
> >
> >
> > On Wed, 2 Aug 2023 at 07:40, Venkatakrishnan Sowrirajan <
> vsowr...@asu.edu>
> > wrote:
> >
> > > Thanks Becket for the suggestion. That makes sense. Let me try it out
> and
> > > get back to you.
> > >
> > > Regards
> > > Venkata krishnan
> > >
> > >
> > > On Tue, Aug 1, 2023 at 9:04 AM Becket Qin 
> wrote:
> > >
> > > > This is a very useful feature in practice.
> > > >
> > > > It looks to me that the key issue here is that Flink
> ResolvedExpression
> > > > does not have necessary abstraction for nested field access. So the
> > > Calcite
> > > > RexFieldAccess does not have a counterpart in the ResolvedExpression.
> > The
> > > > FieldReferenceExpression only supports direct access to the fields,
> not
> > > > nested access.
> > > >
> > > > Theoretically speaking, this nested field reference is also required
> by
> > > > projection pushdown. However, we addressed that by using an int[][]
> in
> > > the
> > > > SupportsProjectionPushDown interface. Maybe we can do the following:
> > > >
> > > > 1. Extend the FieldReferenceExpression to include an int[] for nested
> > > field
> > > > access,
> > > > 2. By doing (1),
> > > > SupportsFilterPushDown#applyFilters(List) can
> > support
> > > > nested field access.
> > > > 3. Evolve the SupportsProjectionPushDown.applyProjection(int[][]
> > > > projectedFields, DataType producedDataType) to
> > > > applyProjection(List projectedFields,
> > DataType
> > > > producedDataType)
> > > >
> > > > This will need a FLIP.
> > > >
> > > > Thanks,
> > > >
> > > > Jiangjie (Becket) Qin
> > > >
> > > > On Tue, Aug 1, 2023 at 11:42 PM Venkatakrishnan Sowrirajan <
> > > > vsowr...@asu.edu>
> > > > wrote:
> > > >
> > > > > Thanks for the response. Looking forward to your pointers. In the
> > > > > meanwhile, let me figure out how we can implement it. Will keep you
> > > > posted.
> > > > >
> > > > > On Mon, Jul 31, 2023, 11:43 PM liu ron  wrote:
> > > > >
> > > > > > Hi, Venkata
> > > > > >
> > > > > > Thanks for reporting this issue. Currently, Flink doesn't support
> > > > nested
> > > > > > filter pushdown. I also think that this optimization would be
> > useful,
> > > > > > especially for jobs, which may need to read a lot of data from
> the
> > > > > parquet
> > > > > > or orc file. We didn't move forward wi

Re: [ANNOUNCE] New Apache Flink Committer - Weihua Hu

2023-08-06 Thread yh z
Congratulations, Weihua!

Best,
Yunhong Zheng (Swuferhong)

Runkang He  于2023年8月5日周六 21:34写道:

> Congratulations, Weihua!
>
> Best,
> Runkang He
>
> Kelu Tao  于2023年8月4日周五 18:21写道:
>
> > Congratulations!
> >
> > On 2023/08/04 08:35:49 Danny Cranmer wrote:
> > > Congrats and welcome to the team, Weihua!
> > >
> > > Thanks,
> > > Danny
> > >
> > > On Fri, Aug 4, 2023 at 9:30 AM Feng Jin  wrote:
> > >
> > > > Congratulations Weihua!
> > > >
> > > > Best regards,
> > > >
> > > > Feng
> > > >
> > > > On Fri, Aug 4, 2023 at 4:28 PM weijie guo  >
> > > > wrote:
> > > >
> > > > > Congratulations Weihua!
> > > > >
> > > > > Best regards,
> > > > >
> > > > > Weijie
> > > > >
> > > > >
> > > > > Lijie Wang  于2023年8月4日周五 15:28写道:
> > > > >
> > > > > > Congratulations, Weihua!
> > > > > >
> > > > > > Best,
> > > > > > Lijie
> > > > > >
> > > > > > yuxia  于2023年8月4日周五 15:14写道:
> > > > > >
> > > > > > > Congratulations, Weihua!
> > > > > > >
> > > > > > > Best regards,
> > > > > > > Yuxia
> > > > > > >
> > > > > > > - 原始邮件 -
> > > > > > > 发件人: "Yun Tang" 
> > > > > > > 收件人: "dev" 
> > > > > > > 发送时间: 星期五, 2023年 8 月 04日 下午 3:05:30
> > > > > > > 主题: Re: [ANNOUNCE] New Apache Flink Committer - Weihua Hu
> > > > > > >
> > > > > > > Congratulations, Weihua!
> > > > > > >
> > > > > > >
> > > > > > > Best
> > > > > > > Yun Tang
> > > > > > > 
> > > > > > > From: Jark Wu 
> > > > > > > Sent: Friday, August 4, 2023 15:00
> > > > > > > To: dev@flink.apache.org 
> > > > > > > Subject: Re: [ANNOUNCE] New Apache Flink Committer - Weihua Hu
> > > > > > >
> > > > > > > Congratulations, Weihua!
> > > > > > >
> > > > > > > Best,
> > > > > > > Jark
> > > > > > >
> > > > > > > On Fri, 4 Aug 2023 at 14:48, Yuxin Tan  >
> > > > wrote:
> > > > > > >
> > > > > > > > Congratulations Weihua!
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Yuxin
> > > > > > > >
> > > > > > > >
> > > > > > > > Junrui Lee  于2023年8月4日周五 14:28写道:
> > > > > > > >
> > > > > > > > > Congrats, Weihua!
> > > > > > > > > Best,
> > > > > > > > > Junrui
> > > > > > > > >
> > > > > > > > > Geng Biao  于2023年8月4日周五 14:25写道:
> > > > > > > > >
> > > > > > > > > > Congrats, Weihua!
> > > > > > > > > > Best,
> > > > > > > > > > Biao Geng
> > > > > > > > > >
> > > > > > > > > > 发送自 Outlook for iOS
> > > > > > > > > > 
> > > > > > > > > > 发件人: 周仁祥 
> > > > > > > > > > 发送时间: Friday, August 4, 2023 2:23:42 PM
> > > > > > > > > > 收件人: dev@flink.apache.org 
> > > > > > > > > > 抄送: Weihua Hu 
> > > > > > > > > > 主题: Re: [ANNOUNCE] New Apache Flink Committer - Weihua Hu
> > > > > > > > > >
> > > > > > > > > > Congratulations, Weihua~
> > > > > > > > > >
> > > > > > > > > > > 2023年8月4日 14:21,Sergey Nuyanzin 
> > 写道:
> > > > > > > > > > >
> > > > > > > > > > > Congratulations, Weihua!
> > > > > > > > > > >
> > > > > > > > > > > On Fri, Aug 4, 2023 at 8:03 AM Chen Zhanghao <
> > > > > > > > > zhanghao.c...@outlook.com>
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > >> Congratulations, Weihua!
> > > > > > > > > > >>
> > > > > > > > > > >> Best,
> > > > > > > > > > >> Zhanghao Chen
> > > > > > > > > > >> 
> > > > > > > > > > >> 发件人: Xintong Song 
> > > > > > > > > > >> 发送时间: 2023年8月4日 11:18
> > > > > > > > > > >> 收件人: dev 
> > > > > > > > > > >> 抄送: Weihua Hu 
> > > > > > > > > > >> 主题: [ANNOUNCE] New Apache Flink Committer - Weihua Hu
> > > > > > > > > > >>
> > > > > > > > > > >> Hi everyone,
> > > > > > > > > > >>
> > > > > > > > > > >> On behalf of the PMC, I'm very happy to announce
> Weihua
> > Hu
> > > > as
> > > > > a
> > > > > > > new
> > > > > > > > > > Flink
> > > > > > > > > > >> Committer!
> > > > > > > > > > >>
> > > > > > > > > > >> Weihua has been consistently contributing to the
> project
> > > > since
> > > > > > May
> > > > > > > > > > 2022. He
> > > > > > > > > > >> mainly works in Flink's distributed coordination
> areas.
> > He
> > > > is
> > > > > > the
> > > > > > > > main
> > > > > > > > > > >> contributor of FLIP-298 and many other improvements in
> > > > > > large-scale
> > > > > > > > job
> > > > > > > > > > >> scheduling and improvements. He is also quite active
> in
> > > > > mailing
> > > > > > > > lists,
> > > > > > > > > > >> participating discussions and answering user
> questions.
> > > > > > > > > > >>
> > > > > > > > > > >> Please join me in congratulating Weihua!
> > > > > > > > > > >>
> > > > > > > > > > >> Best,
> > > > > > > > > > >>
> > > > > > > > > > >> Xintong (on behalf of the Apache Flink PMC)
> > > > > > > > > > >>
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > > Best regards,
> > > > > > > > > > > Sergey
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: FLINK-20767 - Support for nested fields filter push down

2023-08-07 Thread yh z
Hi Venkatakrishnan,
Sorry for the late reply. I have looked at the code and feel like you need
to modify the logic of the
ExpressionConverter.visit(FieldReferenceExpression expression) method to
support nested types,
which are not currently supported in currently code.

Regards,
Yunhong Zheng (Swuferhong)

Venkatakrishnan Sowrirajan  于2023年8月7日周一 13:30写道:

> (Sorry, I pressed send too early)
>
> Thanks for the help @zhengyunhon...@gmail.com.
>
> Agree on not changing the API as much as possible as well as wrt
> simplifying Projection pushdown with nested fields eventually as well.
>
> In terms of the code itself, currently I am trying to leverage the
> FieldReferenceExpression to also handle nested fields for filter push down.
> But where I am currently struggling to make progress is, once the filters
> are pushed to the table source itself, in
> PushFilterIntoSourceScanRuleBase#resolveFiltersAndCreateTableSourceTable
> there is a conversion from List FieldReferenceExpression) to the List itself.
>
> If you have some pointers for that, please let me know. Thanks.
>
> Regards
> Venkata krishnan
>
>
> On Sun, Aug 6, 2023 at 10:23 PM Venkatakrishnan Sowrirajan <
> vsowr...@asu.edu>
> wrote:
>
> > Thanks @zhengyunhon...@gmail.com
> > Regards
> > Venkata krishnan
> >
> >
> > On Sun, Aug 6, 2023 at 6:16 PM yh z  wrote:
> >
> >> Hi, Venkatakrishnan,
> >> I think this is a very useful feature. I have been focusing on the
> >> development of the flink-table-planner module recently, so if you need
> >> some
> >> help, I can assist you in completing the development of some sub-tasks
> or
> >> code review.
> >>
> >> Returning to the design itself, I think it's necessary to modify
> >> FieldReferenceExpression or re-implement a
> NestedFieldReferenceExpression.
> >> As for modifying the interface of SupportsProjectionPushDown, I think we
> >> need to make some trade-offs. As a connector developer, the stability of
> >> the interface is very important. If there are no unresolved bugs, I
> >> personally do not recommend modifying the interface. However, when I
> first
> >> read the code of SupportsProjectionPushDown, the design of int[][] was
> >> very
> >> confusing for me, and it took me a long time to understand it by running
> >> specify UT tests. Therefore, in terms of the design of this interface
> and
> >> the consistency between different interfaces, there is indeed room for
> >> improvement it.
> >>
> >> Thanks,
> >> Yunhong Zheng (Swuferhong)
> >>
> >>
> >>
> >>
> >> Becket Qin  于2023年8月3日周四 07:44写道:
> >>
> >> > Hi Jark,
> >> >
> >> > If the FieldReferenceExpression contains an int[] to support a nested
> >> field
> >> > reference, List (or
> >> FieldReferenceExpression[])
> >> > and int[][] are actually equivalent. If we are designing this from
> >> scratch,
> >> > personally I prefer using List for
> >> consistency,
> >> > i.e. always resolving everything to expressions for users. Projection
> >> is a
> >> > simpler case, but should not be a special case. This avoids doing the
> >> same
> >> > thing in different ways which is also a confusion to the users. To me,
> >> the
> >> > int[][] format would become kind of a technical debt after we extend
> the
> >> > FieldReferenceExpression. Although we don't have to address it right
> >> away
> >> > in the same FLIP, this kind of debt accumulates over time and makes
> the
> >> > project harder to learn and maintain. So, personally I prefer to
> address
> >> > these technical debts as soon as possible.
> >> >
> >> > Thanks,
> >> >
> >> > Jiangjie (Becket) Qin
> >> >
> >> > On Wed, Aug 2, 2023 at 8:19 PM Jark Wu  wrote:
> >> >
> >> > > Hi,
> >> > >
> >> > > I agree with Becket that we may need to extend
> >> FieldReferenceExpression
> >> > to
> >> > > support nested field access (or maybe a new
> >> > > NestedFieldReferenceExpression).
> >> > > But I have some concerns about evolving the
> >> > > SupportsProjectionPushDown.applyProjection.
> >> > > A projection is much simpler than Filter Expression which only needs
> >> to
> >> > > represent the field indexes.
> >> > > If 

Re: [ANNOUNCE] New Apache Flink Committer - Hangxiang Yu

2023-08-07 Thread yh z
Congratulations, Hangxiang !


Best,
Yunhong Zheng (Swuferhong)

yuxia  于2023年8月8日周二 09:20写道:

> Congratulations, Hangxiang !
>
> Best regards,
> Yuxia
>
> - 原始邮件 -
> 发件人: "Wencong Liu" 
> 收件人: "dev" 
> 发送时间: 星期一, 2023年 8 月 07日 下午 11:55:24
> 主题: Re:[ANNOUNCE] New Apache Flink Committer - Hangxiang Yu
>
> Congratulations, Hangxiang !
>
>
> Best,
> Wencong
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> At 2023-08-07 14:57:49, "Yuan Mei"  wrote:
> >On behalf of the PMC, I'm happy to announce Hangxiang Yu as a new Flink
> >Committer.
> >
> >Hangxiang has been active in the Flink community for more than 1.5 years
> >and has played an important role in developing and maintaining State and
> >Checkpoint related features/components, including Generic Incremental
> >Checkpoints (take great efforts to make the feature prod-ready). Hangxiang
> >is also the main driver of the FLIP-263: Resolving schema compatibility.
> >
> >Hangxiang is passionate about the Flink community. Besides the technical
> >contribution above, he is also actively promoting Flink: talks about
> Generic
> >Incremental Checkpoints in Flink Forward and Meet-up. Hangxiang also spent
> >a good amount of time supporting users, participating in Jira/mailing list
> >discussions, and reviewing code.
> >
> >Please join me in congratulating Hangxiang for becoming a Flink Committer!
> >
> >Thanks,
> >Yuan Mei (on behalf of the Flink PMC)
>


Re: [ANNOUNCE] New Apache Flink PMC Member - Matthias Pohl

2023-08-07 Thread yh z
Congratulations, Matthias!

Best,
Yunhong Zheng (Swuferhong)

Ryan Skraba  于2023年8月7日周一 21:39写道:

> Congratulations Matthias -- very well-deserved, the community is lucky to
> have you <3
>
> All my best, Ryan
>
> On Mon, Aug 7, 2023 at 3:04 PM Lincoln Lee  wrote:
>
> > Congratulations!
> >
> > Best,
> > Lincoln Lee
> >
> >
> > Feifan Wang  于2023年8月7日周一 20:13写道:
> >
> > > Congrats Matthias!
> > >
> > >
> > >
> > > ——
> > > Name: Feifan Wang
> > > Email: zoltar9...@163.com
> > >
> > >
> > >  Replied Message 
> > > | From | Matthias Pohl |
> > > | Date | 08/7/2023 16:16 |
> > > | To |  |
> > > | Subject | Re: [ANNOUNCE] New Apache Flink PMC Member - Matthias Pohl
> |
> > > Thanks everyone. :)
> > >
> > > On Mon, Aug 7, 2023 at 3:18 AM Andriy Redko  wrote:
> > >
> > > Congrats Matthias, well deserved!!
> > >
> > > DC> Congrats Matthias!
> > >
> > > DC> Very well deserved, thankyou for your continuous, consistent
> > > contributions.
> > > DC> Welcome.
> > >
> > > DC> Thanks,
> > > DC> Danny
> > >
> > > DC> On Fri, Aug 4, 2023 at 9:30 AM Feng Jin 
> > wrote:
> > >
> > > Congratulations, Matthias!
> > >
> > > Best regards
> > >
> > > Feng
> > >
> > > On Fri, Aug 4, 2023 at 4:29 PM weijie guo 
> > > wrote:
> > >
> > > Congratulations, Matthias!
> > >
> > > Best regards,
> > >
> > > Weijie
> > >
> > >
> > > Wencong Liu  于2023年8月4日周五 15:50写道:
> > >
> > > Congratulations, Matthias!
> > >
> > > Best,
> > > Wencong Liu
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > At 2023-08-04 11:18:00, "Xintong Song" 
> > > wrote:
> > > Hi everyone,
> > >
> > > On behalf of the PMC, I'm very happy to announce that Matthias Pohl
> > > has
> > > joined the Flink PMC!
> > >
> > > Matthias has been consistently contributing to the project since
> > > Sep
> > > 2020,
> > > and became a committer in Dec 2021. He mainly works in Flink's
> > > distributed
> > > coordination and high availability areas. He has worked on many
> > > FLIPs
> > > including FLIP195/270/285. He helped a lot with the release
> > > management,
> > > being one of the Flink 1.17 release managers and also very active
> > > in
> > > Flink
> > > 1.18 / 2.0 efforts. He also contributed a lot to improving the
> > > build
> > > stability.
> > >
> > > Please join me in congratulating Matthias!
> > >
> > > Best,
> > >
> > > Xintong (on behalf of the Apache Flink PMC)
> > >
> > >
> > >
> > >
> > >
> > >
> >
>


Re: [DISCUSS] FLIP-356: Support Nested Fields Filter Pushdown

2023-08-24 Thread yh z
Hi, Venkat,

Thanks for the FLIP, it sounds good to support nested fields filter
pushdown. Based on the design of flip and the above options, I would like
to make a few suggestions:

1.  At present, introducing NestedFieldReferenceExpression looks like a
better solution, which can fully meet our requirements while reducing
modifications to base class FieldReferenceExpression. In the long run, I
tend to abstract a basic class for NestedFieldReferenceExpression and
FieldReferenceExpression as u suggested.

2. Personally, I don't recommend introducing *supportsNestedFilters() in
supportsFilterPushdown. We just need to better declare the return value of
the method *applyFilters.

3. Finally, I think we need to look at the costs and benefits of unifying
the SupportsFilterPushDown and SupportsProjectionPushDown (or others) from
the perspective of interface implementers. A stable API can reduce user
development and change costs, if the current API can fully meet the
functional requirements at the framework level, I personal suggest reducing
the impact on connector developers.

Regards,
Yunhong Zheng (Swuferhong)


Venkatakrishnan Sowrirajan  于2023年8月25日周五 01:25写道:

> To keep it backwards compatible, introduce another API *applyAggregates
> *with
> *List *when nested field support is added and
> deprecate the current API. This will by default throw an exception. In
> flink planner, *applyAggregates *with nested fields and if it throws
> exception then *applyAggregates* without nested fields.
>
> Regards
> Venkata krishnan
>
>
> On Thu, Aug 24, 2023 at 10:13 AM Venkatakrishnan Sowrirajan <
> vsowr...@asu.edu> wrote:
>
> > Jark,
> >
> > How about having a separate NestedFieldReferenceExpression, and
> >> abstracting a common base class "ReferenceExpression" for
> >> NestedFieldReferenceExpression and FieldReferenceExpression? This makes
> >> unifying expressions in
> >> "SupportsProjectionPushdown#applyProjections(List
> >> ...)"
> >> possible.
> >
> > This should be fine for *SupportsProjectionPushDown* and
> > *SupportsFilterPushDown*. One concern in the case of
> > *SupportsAggregatePushDown* with nested fields support (to be added in
> > the future), with this proposal, the API will become backwards
> incompatible
> > as the *args *for the aggregate function is
> *List
> > *that needs to change to *List*.
> >
> > Regards
> > Venkata krishnan
> >
> >
> > On Thu, Aug 24, 2023 at 1:18 AM Jark Wu  wrote:
> >
> >> Hi Becket,
> >>
> >> I think it is the second case, that a FieldReferenceExpression is
> >> constructed
> >> by the framework and passed to the connector (interfaces listed by
> >> Venkata[1]
> >> and Catalog#listPartitionsByFilter). Besides, understanding the nested
> >> field
> >> is optional for users/connectors (just treat it as an unknown expression
> >> if
> >> the
> >> connector doesn't want to support it).
> >>
> >> If we extend FieldReferenceExpression, in the case of "where col.nested
> >
> >> 10",
> >> for the connectors already supported filter/delete pushdown, they may
> >> wrongly
> >> pushdown "col > 10" instead of "nested > 10" because they still treat
> >> FieldReferenceExpression as a top-level column. This problem can be
> >> resolved
> >> by introducing an additional "supportedNestedPushdown" for each
> interface,
> >> but that method is not elegant and is hard to remove in the future, and
> >> this could
> >> be avoided if we have a separate NestedFieldReferenceExpression.
> >>
> >> If we want to extend FieldReferenceExpression, we have to add
> protections
> >> for every related API in one shot. Besides, FieldReferenceExpression is
> a
> >> fundamental class in the planner, we have to go through all the code
> that
> >> is using it to make sure it properly handling it if it is a nested field
> >> which
> >> is a big effort for the community.
> >>
> >> If we were designing this API on day 1, I fully support merging them in
> a
> >> FieldReferenceExpression. But in this case, I'm thinking about how to
> >> provide
> >> users with a smooth migration path, and allow the community to gradually
> >> put efforts into evolving the API, and not block the "Nested Fields
> Filter
> >> Pushdown"
> >> requirement.
> >>
> >> How about having a separate NestedFieldReferenceExpression, and
> >> abstracting a common base class "ReferenceExpression" for
> >> NestedFieldReferenceExpression and FieldReferenceExpression? This makes
> >> unifying expressions in
> >> "SupportsProjectionPushdown#applyProjections(List
> >> ...)"
> >> possible.
> >>
> >> Best,
> >> Jark
> >>
> >> On Thu, 24 Aug 2023 at 07:00, Venkatakrishnan Sowrirajan <
> >> vsowr...@asu.edu>
> >> wrote:
> >>
> >> > Becket and Jark,
> >> >
> >> >  Deprecate all the other
> >> > > methods except tryApplyFilters() and tryApplyProjections().
> >> >
> >> > For *SupportsProjectionPushDown*, we still need a
> >> > *supportsNestedProjections* API on the table source as some of the
> table
> >> > sources might not be able to handle nested fields and t

Re: [ANNOUNCE] New Apache Flink Committer - Ron Liu

2023-10-15 Thread yh z
Congratulations, Ron!

Best,
Yunhong (SwuferHong)

Yuxin Tan  于2023年10月16日周一 11:12写道:

> Congratulations, Ron!
>
> Best,
> Yuxin
>
>
> Junrui Lee  于2023年10月16日周一 10:24写道:
>
> > Congratulations Ron !
> >
> > Best,
> > Junrui
> >
> > Yun Tang  于2023年10月16日周一 10:22写道:
> >
> > > Congratulations, Ron!
> > >
> > > Best
> > > Yun Tang
> > > 
> > > From: yu zelin 
> > > Sent: Monday, October 16, 2023 10:16
> > > To: dev@flink.apache.org 
> > > Cc: ron9@gmail.com 
> > > Subject: Re: [ANNOUNCE] New Apache Flink Committer - Ron Liu
> > >
> > > Congratulations!
> > >
> > > Best,
> > > Yu Zelin
> > >
> > > > 2023年10月16日 09:56,Jark Wu  写道:
> > > >
> > > > Hi, everyone
> > > >
> > > > On behalf of the PMC, I'm very happy to announce Ron Liu as a new
> Flink
> > > > Committer.
> > > >
> > > > Ron has been continuously contributing to the Flink project for many
> > > years,
> > > > authored and reviewed a lot of codes. He mainly works on Flink SQL
> > parts
> > > > and drove several important FLIPs, e.g., USING JAR (FLIP-214),
> Operator
> > > > Fusion CodeGen (FLIP-315), Runtime Filter (FLIP-324). He has a great
> > > > knowledge of the Batch SQL and improved a lot of batch performance in
> > the
> > > > past several releases. He is also quite active in mailing lists,
> > > > participating in discussions and answering user questions.
> > > >
> > > > Please join me in congratulating Ron Liu for becoming a Flink
> > Committer!
> > > >
> > > > Best,
> > > > Jark Wu (on behalf of the Flink PMC)
> > >
> > >
> >
>


Re: [ANNOUNCE] New Apache Flink Committer - Jane Chan

2023-10-15 Thread yh z
Congratulations Jane!

Best,
Yunhong (swuferHong)

Yuxin Tan  于2023年10月16日周一 11:11写道:

> Congratulations Jane!
>
> Best,
> Yuxin
>
>
> xiangyu feng  于2023年10月16日周一 10:27写道:
>
> > Congratulations Jane!
> >
> > Best,
> > Xiangyu
> >
> > Xuannan Su  于2023年10月16日周一 10:25写道:
> >
> > > Congratulations Jane!
> > >
> > > Best,
> > > Xuannan
> > >
> > > On Mon, Oct 16, 2023 at 10:21 AM Yun Tang  wrote:
> > > >
> > > > Congratulations, Jane!
> > > >
> > > > Best
> > > > Yun Tang
> > > > 
> > > > From: Rui Fan <1996fan...@gmail.com>
> > > > Sent: Monday, October 16, 2023 10:16
> > > > To: dev@flink.apache.org 
> > > > Cc: qingyue@gmail.com 
> > > > Subject: Re: [ANNOUNCE] New Apache Flink Committer - Jane Chan
> > > >
> > > > Congratulations Jane!
> > > >
> > > > Best,
> > > > Rui
> > > >
> > > > On Mon, Oct 16, 2023 at 10:15 AM yu zelin 
> > wrote:
> > > >
> > > > > Congratulations!
> > > > >
> > > > > Best,
> > > > > Yu Zelin
> > > > >
> > > > > > 2023年10月16日 09:58,Jark Wu  写道:
> > > > > >
> > > > > > Hi, everyone
> > > > > >
> > > > > > On behalf of the PMC, I'm very happy to announce Jane Chan as a
> new
> > > Flink
> > > > > > Committer.
> > > > > >
> > > > > > Jane started code contribution in Jan 2021 and has been active in
> > the
> > > > > Flink
> > > > > > community since. She authored more than 60 PRs and reviewed more
> > > than 40
> > > > > > PRs. Her contribution mainly revolves around Flink SQL, including
> > > Plan
> > > > > > Advice (FLIP-280), operator-level state TTL (FLIP-292), and ALTER
> > > TABLE
> > > > > > statements (FLINK-21634). Jane participated deeply in development
> > > > > > discussions and also helped answer user question emails. Jane was
> > > also a
> > > > > > core contributor of Flink Table Store (now Paimon) when the
> project
> > > was
> > > > > in
> > > > > > the early days.
> > > > > >
> > > > > > Please join me in congratulating Jane Chan for becoming a Flink
> > > > > Committer!
> > > > > >
> > > > > > Best,
> > > > > > Jark Wu (on behalf of the Flink PMC)
> > > > >
> > > > >
> > >
> >
>


Re: [PROPOSAL] Contribute Flink CDC Connectors project to Apache Flink

2023-12-11 Thread yh z
Excited to hear the news,
+1

Best,
Yunhong (Swuferhong)

liu ron  于2023年12月11日周一 14:05写道:

> +1
>
> Best,
> Ron
>
> Yunqing Mo  于2023年12月11日周一 12:01写道:
>
> > So cool, Big +1 for this exciting work.
> >
> > On 2023/12/07 03:24:59 Leonard Xu wrote:
> > > Dear Flink devs,
> > >
> > > As you may have heard, we at Alibaba (Ververica) are planning to donate
> > CDC Connectors for the Apache Flink project[1] to the Apache Flink
> > community.
> > >
> > > CDC Connectors for Apache Flink comprise a collection of source
> > connectors designed specifically for Apache Flink. These connectors[2]
> > enable the ingestion of changes from various databases using Change Data
> > Capture (CDC), most of these CDC connectors are powered by Debezium[3].
> > They support both the DataStream API and the Table/SQL API, facilitating
> > the reading of database snapshots and continuous reading of transaction
> > logs with exactly-once processing, even in the event of failures.
> > >
> > >
> > > Additionally, in the latest version 3.0, we have introduced many
> > long-awaited features. Starting from CDC version 3.0, we've built a
> > Streaming ELT Framework available for streaming data integration. This
> > framework allows users to write their data synchronization logic in a
> > simple YAML file, which will automatically be translated into a Flink
> > DataStreaming job. It emphasizes optimizing the task submission process
> and
> > offers advanced functionalities such as whole database synchronization,
> > merging sharded tables, and schema evolution[4].
> > >
> > >
> > > I believe this initiative is a perfect match for both sides. For the
> > Flink community, it presents an opportunity to enhance Flink's
> competitive
> > advantage in streaming data integration, promoting the healthy growth and
> > prosperity of the Apache Flink ecosystem. For the CDC Connectors project,
> > becoming a sub-project of Apache Flink means being part of a neutral
> > open-source community, which can attract a more diverse pool of
> > contributors.
> > >
> > > Please note that the aforementioned points represent only some of our
> > motivations and vision for this donation. Specific future operations need
> > to be further discussed in this thread. For example, the sub-project name
> > after the donation; we hope to name it Flink-CDC aiming to streaming data
> > intergration through Apache Flink, following the naming convention of
> > Flink-ML; And this project is managed by a total of 8 maintainers,
> > including 3 Flink PMC members and 1 Flink Committer. The remaining 4
> > maintainers are also highly active contributors to the Flink community,
> > donating this project to the Flink community implies that their
> permissions
> > might be reduced. Therefore, we may need to bring up this topic for
> further
> > discussion within the Flink PMC. Additionally, we need to discuss how to
> > migrate existing users and documents. We have a user group of nearly
> 10,000
> > people and a multi-version documentation site need to migrate. We also
> need
> > to plan for the migration of CI/CD processes and other specifics.
> > >
> > >
> > > While there are many intricate details that require implementation, we
> > are committed to progressing and finalizing this donation process.
> > >
> > >
> > > Despite being Flink’s most active ecological project (as evaluated by
> > GitHub metrics), it also boasts a significant user base. However, I
> believe
> > it's essential to commence discussions on future operations only after
> the
> > community reaches a consensus on whether they desire this donation.
> > >
> > >
> > > Really looking forward to hear what you think!
> > >
> > >
> > > Best,
> > > Leonard (on behalf of the Flink CDC Connectors project maintainers)
> > >
> > > [1] https://github.com/ververica/flink-cdc-connectors
> > > [2]
> >
> https://ververica.github.io/flink-cdc-connectors/master/content/overview/cdc-connectors.html
> > > [3] https://debezium.io
> > > [4]
> >
> https://ververica.github.io/flink-cdc-connectors/master/content/overview/cdc-pipeline.html
> >
>


Re: [DISCUSS] Release Flink 1.16.1

2022-12-22 Thread yh z
Hi Martijn,

+1
Thank you for bringing this up.  There is a serious bug in planner rule
#FlinkJoinToMultiJoinRule, which will generate a wrong join reorder plan
and will cause data correctness issue:
https://issues.apache.org/jira/browse/FLINK-30270 (reviewing)

I hope this bug fix can be picked to 1.16.1 to avoid data correctness
problems that users cannot perceive.

Glad to hear your suggestions.

Best,
Yunhong Zheng

Lincoln Lee  于2022年12月21日周三 21:44写道:

> Hi Martijn,
>
> Agree that we need to detail the specific case in the release notes. I will
> update the 'Release Note' related content of the three issues before the
> release.
> Also update the progress of the last issue FLINK-29849, which has been
> merged to master.
>
> If no one objects, FLINK-28988 and FLINK-29849 need to be tagged with
> 1.16.1 and picked to the 1.16 branch (I don't have permission to do this
> yet, would you like to help with this?)
>
> Best,
> Lincoln Lee
>
>
> Martijn Visser  于2022年12月21日周三 16:00写道:
>
> > Hi Lincoln,
> >
> > I'm +1 for also merging them back to 1.16.1. In the release notes we
> should
> > make it clear in which situations incompatibility issues will arise, so
> > that we inform the users correctly.
> >
> > Best regards,
> >
> > Martijn
> >
> > On Wed, Dec 21, 2022 at 8:41 AM godfrey he  wrote:
> >
> > > Hi Martijn,
> > >
> > > Thank you for bringing this up.
> > >
> > > About Lincoln mentioned 3 commits, +1 to pick them into 1.16.1.
> > > AFAIK, several users have encountered this kind of data correctness
> > > problem so far, they are waiting a fix release as soon as possible.
> > >
> > > Best,
> > > Godfrey
> > >
> > > ConradJam  于2022年12月20日周二 15:08写道:
> > >
> > > > Hi Martijn,
> > > >
> > > > FLINK-30116 
> After
> > > > merge.Flink Web Ui Configuration Can't show it,I checked the data
> > > returned
> > > > by the back end and there is no problem, but there is an error in the
> > > front
> > > > end, as shown in the picture below, can someone take a look before
> > > release
> > > > 1.16.1 ?
> > > >
> > > > [image: Pasted Graphic.png]
> > > >
> > > > [image: Pasted Graphic 1.png]
> > > >
> > > > Martijn Visser  于2022年12月16日周五 02:52写道:
> > > >
> > > >> Hi everyone,
> > > >>
> > > >> I would like to open a discussion about releasing Flink 1.16.1.
> We've
> > > >> released Flink 1.16 at the end of October, but we already have 58
> > fixes
> > > >> listed for 1.16.1, including a blocker [1] on the environment
> > variables
> > > >> and
> > > >> a number of critical issues. Some of the critical issues are related
> > to
> > > >> the
> > > >> bugs on the Sink API, on PyFlink and some correctness issues.
> > > >>
> > > >> There are also a number of open issues with a fixVersion set to
> > 1.16.1,
> > > so
> > > >> it would be good to understand what the community thinks of
> starting a
> > > >> release or if there are some fixes that should be included with
> > 1.16.1.
> > > >>
> > > >> Best regards,
> > > >>
> > > >> Martijn
> > > >>
> > > >> [1] https://issues.apache.org/jira/browse/FLINK-30116
> > > >>
> > > >
> > >
> >
>


[DISCUSS] Adding a option for planner to decide which join reorder rule to choose

2022-12-28 Thread yh z
Hi, devs,

I'd like to start a discuss about adding an option called
"table.oprimizer.busy-join-reorder-threshold" for planner rule while we try
to introduce a new busy join reorder rule[1] into Flink.

This join reorder rule is based on dynamic programing[2], which can store
all possible intermediate results, and the cost model can be used to select
the optimal join reorder result. Compare with the existing Lopt join
reorder rule, the new rule can give more possible results and the result
can be more accurate. However, the search space of this rule will become
very large as the number of tables increases. So we should introduce an
option to limit the expansion of search space, if the number of table can
be reordered less than the threshold, the new busy join reorder rule is
used. On the contrary, the Lopt rule is used.

The default threshold intended to be set to 12. One reason is that in the
tpc-ds benchmark test, when the number of tables exceeds 12, the
optimization time will be very long. The other reason is that it refers to
relevant engines, like Spark, whose recommended setting is 12.[3]

Looking forward to your feedback.

[1]  https://issues.apache.org/jira/browse/FLINK-30376
[2]
https://courses.cs.duke.edu/compsci516/cps216/spring03/papers/selinger-etal-1979.pdf
[3]
https://spark.apache.org/docs/3.3.1/configuration.html#runtime-sql-configuration

Best regards,
Yunhong Zheng


Re: [DISCUSS] Adding a option for planner to decide which join reorder rule to choose

2023-01-02 Thread yh z
Hi Benchao,

Thanks for your reply.

Actually,  I mistakenly wrote the name "bushy join reorder" to "busy join
reorder". I'm sorry for the trouble bring to you. "Bushy join reorder"
means we can build a bushy join tree based on cost model, but now Flink can
only build a left-deep tree using Calcite LoptOptimizeJoinRule. I hope my
answers can help you solve the following questions:

For question #1: The biggest advantage of this "bushy join reorder"
strategy over the default Flink left-deep tree strategy is that it can
retail all possible join reorder plans, and then select the optimal plan
according to the cost model. This means that the busy join reorder strategy
can be better combined with the current cost model to get more reasonable
join reorder results. We verified it on the TPC-DS benchmark, with the
spark plan as a reference, the new busy join reorder strategy can make more
TPC-DS query plans be adjusted to be consistent with the Spark plan, and
the execution time is signifcantly reduced.  As for optimization
latency, this is the problem to be solved by the parameters to be
introduced in this discussion. When there are many tables need to be
reordered, the optimization latency will increase greatly. But when the
table numbers less than the threshold, the latency is the same as the
LoptOptimizeJoinRule.

For question #2: According to my research, many compute or database systems
have the "bushy join reorder" strategies based on dynamic programming. For
example, Spark and PostgresSql use the same strategy, and the threshold be
set to 12. Also, some papers, like [1] and [2], have also researched this
strategy, and [2] set the threshold to 14.

For question #3: The implementation of Calcite MultiJoinOptimizeBushyRule
is very simple, and it will not store the intermediate results at all. So,
the implementation of Calcite cannot get all possible join reorder results
and it cannot combine with the current cost model to get more reasonable
join reorder results.


[1]
https://courses.cs.duke.edu/compsci516/cps216/spring03/papers/selinger-etal-1979.pdf
[2] https://db.in.tum.de/~radke/papers/hugejoins.pdf



Benchao Li  于2023年1月3日周二 12:54写道:

> Hi Yunhong,
>
> Thanks for driving this~
>
> I haven't gone deep into the implementation details yet. Regarding the
> general description, I would ask a few questions firstly:
>
> #1, Is there any benchmark results about the optimization latency change
> compared to current approach? In OLAP scenario, query optimization latency
> is more crucial.
>
> #2, About the term "busy join reorder", is there any others systems which
> also use this term? I know Calcite has a rule[1] which uses the term "bushy
> join".
>
> #3, About the implementation, if this does the same work as Calcite
> MultiJoinOptimizeBushyRule, is it possible to use the Calcite version
> directly or extend it in some way?
>
> [1]
>
> https://github.com/apache/calcite/blob/9054682145727fbf8a13e3c79b3512be41574349/core/src/main/java/org/apache/calcite/rel/rules/MultiJoinOptimizeBushyRule.java#L78
>
> yh z  于2022年12月29日周四 14:44写道:
>
> > Hi, devs,
> >
> > I'd like to start a discuss about adding an option called
> > "table.oprimizer.busy-join-reorder-threshold" for planner rule while we
> try
> > to introduce a new busy join reorder rule[1] into Flink.
> >
> > This join reorder rule is based on dynamic programing[2], which can store
> > all possible intermediate results, and the cost model can be used to
> select
> > the optimal join reorder result. Compare with the existing Lopt join
> > reorder rule, the new rule can give more possible results and the result
> > can be more accurate. However, the search space of this rule will become
> > very large as the number of tables increases. So we should introduce an
> > option to limit the expansion of search space, if the number of table can
> > be reordered less than the threshold, the new busy join reorder rule is
> > used. On the contrary, the Lopt rule is used.
> >
> > The default threshold intended to be set to 12. One reason is that in the
> > tpc-ds benchmark test, when the number of tables exceeds 12, the
> > optimization time will be very long. The other reason is that it refers
> to
> > relevant engines, like Spark, whose recommended setting is 12.[3]
> >
> > Looking forward to your feedback.
> >
> > [1]  https://issues.apache.org/jira/browse/FLINK-30376
> > [2]
> >
> >
> https://courses.cs.duke.edu/compsci516/cps216/spring03/papers/selinger-etal-1979.pdf
> > [3]
> >
> >
> https://spark.apache.org/docs/3.3.1/configuration.html#runtime-sql-configuration
> >
> > Best regards,
> > Yunhong Zheng
> >
>
>
> --
>
> Best,
> Benchao Li
>


Re: [DISCUSS] Adding a option for planner to decide which join reorder rule to choose

2023-01-03 Thread yh z
Hi Jark,

Thanks for your reply.

We are going to use bushy join reorder rule and Lopt join reorder rule at
the same time. By setting the threshold
"table.optimizer.bushy-join-reorder-threshold", when the number of tables
need to be reordered is less than/equals this threshold, bushy join reorder
rule will be used. On the contrary, when the number of tables need to be
reordered is greater than this threshold, Lopt join reorder will be used.
Since for most queries, the number of tables need to be reordered is less
than/equals this threshold, so bushy join reorder rule can be regarded as
the default join reorder rule.

I'm really sorry. Because I didn't carefully check the contents of the
first email, I wrote the wrong words in that email. I have made sure that
the correct word "bushy" is used in pr[1]. The threshold name indeed is
"table.optimizer.bushy-join-reorder-threshold".

[1] https://github.com/apache/flink/pull/21530

Best regards,
Yunhong Zheng

Jark Wu  于2023年1月3日周二 20:06写道:

> Hi Yuhong,
>
> Thanks for driving the feature.
>
> I just have one question. Is the bushy join reorder optimization enabled
> by default? Does the bushy join reorder will replace the existing Lopt join
> reorder rule?
>
> Besides, I guess the option "table.oprimizer.busy-join-reorder-threshold”
> should be "table.optimizer.bushy-join-reorder-threshold”?  (I guess they
> are just typos, as your last email said, but I just want to clarify as it
> is a public API).
>
> Best,
> Jark
>
>
> > 2023年1月3日 12:53,Benchao Li  写道:
> >
> > Hi Yunhong,
> >
> > Thanks for driving this~
> >
> > I haven't gone deep into the implementation details yet. Regarding the
> > general description, I would ask a few questions firstly:
> >
> > #1, Is there any benchmark results about the optimization latency change
> > compared to current approach? In OLAP scenario, query optimization
> latency
> > is more crucial.
> >
> > #2, About the term "busy join reorder", is there any others systems which
> > also use this term? I know Calcite has a rule[1] which uses the term
> "bushy
> > join".
> >
> > #3, About the implementation, if this does the same work as Calcite
> > MultiJoinOptimizeBushyRule, is it possible to use the Calcite version
> > directly or extend it in some way?
> >
> > [1]
> >
> https://github.com/apache/calcite/blob/9054682145727fbf8a13e3c79b3512be41574349/core/src/main/java/org/apache/calcite/rel/rules/MultiJoinOptimizeBushyRule.java#L78
> >
> > yh z  于2022年12月29日周四 14:44写道:
> >
> >> Hi, devs,
> >>
> >> I'd like to start a discuss about adding an option called
> >> "table.oprimizer.busy-join-reorder-threshold" for planner rule while we
> try
> >> to introduce a new busy join reorder rule[1] into Flink.
> >>
> >> This join reorder rule is based on dynamic programing[2], which can
> store
> >> all possible intermediate results, and the cost model can be used to
> select
> >> the optimal join reorder result. Compare with the existing Lopt join
> >> reorder rule, the new rule can give more possible results and the result
> >> can be more accurate. However, the search space of this rule will become
> >> very large as the number of tables increases. So we should introduce an
> >> option to limit the expansion of search space, if the number of table
> can
> >> be reordered less than the threshold, the new busy join reorder rule is
> >> used. On the contrary, the Lopt rule is used.
> >>
> >> The default threshold intended to be set to 12. One reason is that in
> the
> >> tpc-ds benchmark test, when the number of tables exceeds 12, the
> >> optimization time will be very long. The other reason is that it refers
> to
> >> relevant engines, like Spark, whose recommended setting is 12.[3]
> >>
> >> Looking forward to your feedback.
> >>
> >> [1]  https://issues.apache.org/jira/browse/FLINK-30376
> >> [2]
> >>
> >>
> https://courses.cs.duke.edu/compsci516/cps216/spring03/papers/selinger-etal-1979.pdf
> >> [3]
> >>
> >>
> https://spark.apache.org/docs/3.3.1/configuration.html#runtime-sql-configuration
> >>
> >> Best regards,
> >> Yunhong Zheng
> >>
> >
> >
> > --
> >
> > Best,
> > Benchao Li
>
>


Re: [DISCUSS] Adding a option for planner to decide which join reorder rule to choose

2023-01-04 Thread yh z
Hi Benchao,

Thanks for your reply.

Since our existing test results are based on multiple performance
optimization points on the TPC-DS benchmark[1][2], we haven't separately
tested the performance improvement brought by new bushy join reorder
rule. I will complete this test recently and update the results to this
email.

I am very happy to contribute to Calcite. Later, I will push the PR of the
bushy join reorder rule to Calcite.

[1] https://issues.apache.org/jira/browse/FLINK-27583
[2] https://issues.apache.org/jira/browse/FLINK-29942

Best regards,
Yunhong Zheng

Benchao Li  于2023年1月4日周三 19:03写道:

> Hi Yunhong,
>
> Thanks for the updating. And introducing the new bushy join reorder
> algorithm would be great. And I also agree with the newly added config
> option "table.optimizer.bushy-join-reorder-threshold" and 12 as the default
> value.
>
>
> > As for optimization
> > latency, this is the problem to be solved by the parameters to be
> > introduced in this discussion. When there are many tables need to be
> > reordered, the optimization latency will increase greatly. But when the
> > table numbers less than the threshold, the latency is the same as the
> > LoptOptimizeJoinRule.
>
>
> This sounds great. If possible, could you share more numbers to us? E.g.,
> what's the latency of optimization when there are 11/12 tables for both
> approach?
>
>  For question #3: The implementation of Calcite MultiJoinOptimizeBushyRule
> > is very simple, and it will not store the intermediate results at all.
> So,
> > the implementation of Calcite cannot get all possible join reorder
> results
> > and it cannot combine with the current cost model to get more reasonable
> > join reorder results.
>
>
> It's ok to do it in Flink as the first step. It would be great to also
> contribute it to Calcite later if possible, it depends on you.
>
> yh z  于2023年1月3日周二 15:27写道:
>
> > Hi Benchao,
> >
> > Thanks for your reply.
> >
> > Actually,  I mistakenly wrote the name "bushy join reorder" to "busy join
> > reorder". I'm sorry for the trouble bring to you. "Bushy join reorder"
> > means we can build a bushy join tree based on cost model, but now Flink
> can
> > only build a left-deep tree using Calcite LoptOptimizeJoinRule. I hope my
> > answers can help you solve the following questions:
> >
> > For question #1: The biggest advantage of this "bushy join reorder"
> > strategy over the default Flink left-deep tree strategy is that it can
> > retail all possible join reorder plans, and then select the optimal plan
> > according to the cost model. This means that the busy join reorder
> strategy
> > can be better combined with the current cost model to get more reasonable
> > join reorder results. We verified it on the TPC-DS benchmark, with the
> > spark plan as a reference, the new busy join reorder strategy can make
> more
> > TPC-DS query plans be adjusted to be consistent with the Spark plan, and
> > the execution time is signifcantly reduced.  As for optimization
> > latency, this is the problem to be solved by the parameters to be
> > introduced in this discussion. When there are many tables need to be
> > reordered, the optimization latency will increase greatly. But when the
> > table numbers less than the threshold, the latency is the same as the
> > LoptOptimizeJoinRule.
> >
> > For question #2: According to my research, many compute or database
> systems
> > have the "bushy join reorder" strategies based on dynamic programming.
> For
> > example, Spark and PostgresSql use the same strategy, and the threshold
> be
> > set to 12. Also, some papers, like [1] and [2], have also researched this
> > strategy, and [2] set the threshold to 14.
> >
> > For question #3: The implementation of Calcite MultiJoinOptimizeBushyRule
> > is very simple, and it will not store the intermediate results at all.
> So,
> > the implementation of Calcite cannot get all possible join reorder
> results
> > and it cannot combine with the current cost model to get more reasonable
> > join reorder results.
> >
> >
> > [1]
> >
> >
> https://courses.cs.duke.edu/compsci516/cps216/spring03/papers/selinger-etal-1979.pdf
> > [2] https://db.in.tum.de/~radke/papers/hugejoins.pdf
> >
> >
> >
> > Benchao Li  于2023年1月3日周二 12:54写道:
> >
> > > Hi Yunhong,
> > >
> > > Thanks for driving this~
> > >
> > > I haven't gone deep into the implementation details yet. Regarding the
> > > general

Re: [DISCUSS] Adding a option for planner to decide which join reorder rule to choose

2023-01-10 Thread yh z
Hi all,

Thanks for yours reply.

After receiving your comments and making targeted modifications. The
conclusion is that the option
"table.optimizer.bushy-join-reorder-threshold" can be added. Relevant PR
[1] has been submitted. Sincerely welcome to review it. Thank you.

This discussion will be closed soon. Thanks for your comments.

[1] https://github.com/apache/flink/pull/21530

Best regards,
Yunhong Zheng

godfrey he  于2023年1月9日周一 10:26写道:

> Hi Yunhong,
>
> Thanks for driving this discuss!
>
> This option looks good to me,
> and looking forward to contributing this rule back to Apache Calcite.
>
> Best,
> Godfrey
>
>
>
> yh z  于2023年1月5日周四 15:32写道:
> >
> > Hi Benchao,
> >
> > Thanks for your reply.
> >
> > Since our existing test results are based on multiple performance
> > optimization points on the TPC-DS benchmark[1][2], we haven't separately
> > tested the performance improvement brought by new bushy join reorder
> > rule. I will complete this test recently and update the results to this
> > email.
> >
> > I am very happy to contribute to Calcite. Later, I will push the PR of
> the
> > bushy join reorder rule to Calcite.
> >
> > [1] https://issues.apache.org/jira/browse/FLINK-27583
> > [2] https://issues.apache.org/jira/browse/FLINK-29942
> >
> > Best regards,
> > Yunhong Zheng
> >
> > Benchao Li  于2023年1月4日周三 19:03写道:
> >
> > > Hi Yunhong,
> > >
> > > Thanks for the updating. And introducing the new bushy join reorder
> > > algorithm would be great. And I also agree with the newly added config
> > > option "table.optimizer.bushy-join-reorder-threshold" and 12 as the
> default
> > > value.
> > >
> > >
> > > > As for optimization
> > > > latency, this is the problem to be solved by the parameters to be
> > > > introduced in this discussion. When there are many tables need to be
> > > > reordered, the optimization latency will increase greatly. But when
> the
> > > > table numbers less than the threshold, the latency is the same as the
> > > > LoptOptimizeJoinRule.
> > >
> > >
> > > This sounds great. If possible, could you share more numbers to us?
> E.g.,
> > > what's the latency of optimization when there are 11/12 tables for both
> > > approach?
> > >
> > >  For question #3: The implementation of Calcite
> MultiJoinOptimizeBushyRule
> > > > is very simple, and it will not store the intermediate results at
> all.
> > > So,
> > > > the implementation of Calcite cannot get all possible join reorder
> > > results
> > > > and it cannot combine with the current cost model to get more
> reasonable
> > > > join reorder results.
> > >
> > >
> > > It's ok to do it in Flink as the first step. It would be great to also
> > > contribute it to Calcite later if possible, it depends on you.
> > >
> > > yh z  于2023年1月3日周二 15:27写道:
> > >
> > > > Hi Benchao,
> > > >
> > > > Thanks for your reply.
> > > >
> > > > Actually,  I mistakenly wrote the name "bushy join reorder" to "busy
> join
> > > > reorder". I'm sorry for the trouble bring to you. "Bushy join
> reorder"
> > > > means we can build a bushy join tree based on cost model, but now
> Flink
> > > can
> > > > only build a left-deep tree using Calcite LoptOptimizeJoinRule. I
> hope my
> > > > answers can help you solve the following questions:
> > > >
> > > > For question #1: The biggest advantage of this "bushy join reorder"
> > > > strategy over the default Flink left-deep tree strategy is that it
> can
> > > > retail all possible join reorder plans, and then select the optimal
> plan
> > > > according to the cost model. This means that the busy join reorder
> > > strategy
> > > > can be better combined with the current cost model to get more
> reasonable
> > > > join reorder results. We verified it on the TPC-DS benchmark, with
> the
> > > > spark plan as a reference, the new busy join reorder strategy can
> make
> > > more
> > > > TPC-DS query plans be adjusted to be consistent with the Spark plan,
> and
> > > > the execution time is signifcantly reduced.  As for optimization
> > > > latency, this is the problem to be solved by the parameters to be
> > &g

Re: [ANNOUNCE] New Apache Flink Committer - Lincoln Lee

2023-01-10 Thread yh z
Congratulations, Lincoln!

Best regards,
Yunhong Zheng

Biao Liu  于2023年1月10日周二 15:02写道:

> Congratulations, Lincoln!
>
> Thanks,
> Biao /'bɪ.aʊ/
>
>
>
> On Tue, 10 Jan 2023 at 14:59, Hang Ruan  wrote:
>
> > Congratulations, Lincoln!
> >
> > Best,
> > Hang
> >
> > Biao Geng  于2023年1月10日周二 14:57写道:
> >
> > > Congrats, Lincoln!
> > > Best,
> > > Biao Geng
> > >
> > > 获取 Outlook for iOS
> > > 
> > > 发件人: Wencong Liu 
> > > 发送时间: Tuesday, January 10, 2023 2:39:47 PM
> > > 收件人: dev@flink.apache.org 
> > > 主题: Re:Re: [ANNOUNCE] New Apache Flink Committer - Lincoln Lee
> > >
> > > Congratulations, Lincoln!
> > >
> > > Best regards,
> > > Wencong
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > 在 2023-01-10 13:25:09,"Yanfei Lei"  写道:
> > > >Congratulations, well deserved!
> > > >
> > > >Best,
> > > >Yanfei
> > > >
> > > >Yuan Mei  于2023年1月10日周二 13:16写道:
> > > >
> > > >> Congratulations, Lincoln!
> > > >>
> > > >> Best,
> > > >> Yuan
> > > >>
> > > >> On Tue, Jan 10, 2023 at 12:23 PM Lijie Wang <
> wangdachui9...@gmail.com
> > >
> > > >> wrote:
> > > >>
> > > >> > Congratulations, Lincoln!
> > > >> >
> > > >> > Best,
> > > >> > Lijie
> > > >> >
> > > >> > Jingsong Li  于2023年1月10日周二 12:07写道:
> > > >> >
> > > >> > > Congratulations, Lincoln!
> > > >> > >
> > > >> > > Best,
> > > >> > > Jingsong
> > > >> > >
> > > >> > > On Tue, Jan 10, 2023 at 11:56 AM Leonard Xu 
> > > wrote:
> > > >> > > >
> > > >> > > > Congratulations, Lincoln!
> > > >> > > >
> > > >> > > > Impressive work in streaming semantics, well deserved!
> > > >> > > >
> > > >> > > >
> > > >> > > > Best,
> > > >> > > > Leonard
> > > >> > > >
> > > >> > > >
> > > >> > > > > On Jan 10, 2023, at 11:52 AM, Jark Wu 
> > wrote:
> > > >> > > > >
> > > >> > > > > Hi everyone,
> > > >> > > > >
> > > >> > > > > On behalf of the PMC, I'm very happy to announce Lincoln Lee
> > as
> > > a
> > > >> new
> > > >> > > Flink
> > > >> > > > > committer.
> > > >> > > > >
> > > >> > > > > Lincoln Lee has been a long-term Flink contributor since
> 2017.
> > > He
> > > >> > > mainly
> > > >> > > > > works on Flink
> > > >> > > > > SQL parts and drives several important FLIPs, e.g., FLIP-232
> > > (Retry
> > > >> > > Async
> > > >> > > > > I/O), FLIP-234 (
> > > >> > > > > Retryable Lookup Join), FLIP-260 (TableFunction Finish).
> > > Besides,
> > > >> He
> > > >> > > also
> > > >> > > > > contributed
> > > >> > > > > much to Streaming Semantics, including the non-determinism
> > > problem
> > > >> > and
> > > >> > > the
> > > >> > > > > message
> > > >> > > > > ordering problem.
> > > >> > > > >
> > > >> > > > > Please join me in congratulating Lincoln for becoming a
> Flink
> > > >> > > committer!
> > > >> > > > >
> > > >> > > > > Cheers,
> > > >> > > > > Jark Wu
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
>


Re: [ANNOUNCE] Release 1.17.0, release candidate #1

2023-03-13 Thread yh z
Hi Qingsheng Ren and other release manager,
Thank you for bringing this up.  There is a serious bug in planner rule
#FlinkFilterJoinRule, which will generate a wrong join plan and no result
will output:
https://issues.apache.org/jira/browse/FLINK-31273 (reviewing).

This bug will be triggered by a simple sql pattern, so I marked this issue
as blocker.

I hope this bug fix can be picked to 1.17.0 to avoid data correctness
problems that users cannot perceive.

Glad to hear your suggestions.

Best,
Yunhong Zheng

Qingsheng Ren  于2023年3月13日周一 18:07写道:

> Thanks everyone for validating the RC!
>
> As there are some patches to be included and signature issues, the RC1 of
> 1.17.0 is officially canceled. Because the ASF Nexus server is now in
> maintenance, we'll build a new RC after the server gets online, which is
> hopefully on Tuesday (Mar 14).
>
> Best,
> Qingsheng
>
> On Fri, Mar 10, 2023 at 10:42 PM Yingjie Cao 
> wrote:
>
> > Hi Martijn,
> >
> > Thanks a lot.
> > Fix has be merged.
> >
> > Best regards,
> > Yingjie
> >
> > Martijn Visser  于2023年3月10日周五 03:32写道:
> >
> > > Hi Yingjie,
> > >
> > > Thanks for the test and identifying the issue, this is super helpful!
> > >
> > > To all others, please continue your testing on this RC so that if there
> > are
> > > more blockers to be found, we can fix them with the next RC and have
> > > (hopefully) a successful vote on it.
> > >
> > > Best regards,
> > >
> > > Martijn
> > >
> > > On Thu, Mar 9, 2023 at 4:54 PM Yingjie Cao 
> > > wrote:
> > >
> > > > Hi community and release managers:
> > > >
> > > > When testing the release candidate #1 for batch scenario, I found a
> > > > potential deadlock issue of blocking shuffle. I have created a ticket
> > [1]
> > > > for it and marked it as blocker. I will fix it no later than
> tomorrow.
> > > >
> > > > [1] https://issues.apache.org/jira/browse/FLINK-31386
> > > >
> > > > Best regards,
> > > > Yingjie
> > > >
> > > > Qingsheng Ren  于2023年3月9日周四 13:51写道:
> > > >
> > > > > Hi everyone,
> > > > >
> > > > > The RC1 for Apache Flink 1.17.0 has been created. This RC currently
> > is
> > > > for
> > > > > preview only to facilitate the integrated testing since the release
> > > > > announcement is still under review. The voting process will be
> > > triggered
> > > > > once the announcement is ready. It has all the artifacts that we
> > would
> > > > > typically have for a release, except for the release note and the
> > > website
> > > > > pull request for the release announcement.
> > > > >
> > > > > The following contents are available for your review:
> > > > >
> > > > > - The preview source release and binary convenience releases [1],
> > which
> > > > > are signed with the key with fingerprint A1BD477F79D036D2C30C [2].
> > > > > - all artifacts that would normally be deployed to the Maven
> > > > > Central Repository [3].
> > > > > - source code tag "release-1.17.0-rc1" [4]
> > > > >
> > > > > Your help testing the release will be greatly appreciated! And
> we'll
> > > > > create the voting thread as soon as all the efforts are finished.
> > > > >
> > > > > [1] https://dist.apache.org/repos/dist/dev/flink/flink-1.17.0-rc1
> > > > > [2] https://dist.apache.org/repos/dist/release/flink/KEYS
> > > > > [3]
> > > >
> https://repository.apache.org/content/repositories/orgapacheflink-1591
> > > > > [4]
> https://github.com/apache/flink/releases/tag/release-1.17.0-rc1
> > > > >
> > > > > Best regards,
> > > > > Qingsheng, Leonard, Matthias and Martijn
> > > > >
> > > >
> > >
> >
>


[DISCUSS] Flink aggregate push down is not user-friendly.

2024-02-05 Thread yh z
Hi, Devs,
When I try to develop a new connector, which support aggregate push
down. I found that  Flink aggregate pushdown is not user-friendly. The
`AggregateExpression` passed to the connector by
`SupportsAggregatePushDown#applyAggregates` doesn't provide access to
subclasses. This makes it impossible for me to directly determine the type
of agg operator unless I import the planner module, but this is discouraged
and considered a heavyweight action.
Because I cann't access the subclasses of
`AggregateExpression#FunctionDefinition`, like `CountAggFunction` and am
unable to import the planner module, I'm forced to match the agg operators
using hack way that match fully qualified class names like the following
code:

FunctionDefinition functionDefinition =
aggregateExpressions.get(0).getFunctionDefinition();
if (!(functionDefinition
.getClass()
.getCanonicalName()
.equals(

"org.apache.flink.table.planner.functions.aggfunctions.CountAggFunction")
|| functionDefinition
.getClass()
.getCanonicalName()
.equals(

"org.apache.flink.table.planner.functions.aggfunctions.Count1AggFunction")))
{
return false;
}


I think the problem may also exist with other SupportsXxxPushDown. Should
we consider which planner classes can be exposed to developers to
facilitate their use?

Yours,
Swuferhong (Yunhong Zheng).


Re: [ANNOUNCE] New Apache Flink Committer - Jiabao Sun

2024-02-26 Thread yh z
Congratulations Jiabao!

Best,
Yunhong Zheng (SwufeHong)

yu'an huang  于2024年2月26日周一 10:28写道:

> Congratulations, Jiabao!
>
> Best,
> Yuan
>
>
> On Mon, 26 Feb 2024 at 9:38 AM, Ron liu  wrote:
>
> > Congratulations, Jiabao!
> >
> > Best,
> > Ron
> >
> > Yun Tang  于2024年2月23日周五 19:59写道:
> >
> > > Congratulations, Jiabao!
> > >
> > > Best
> > > Yun Tang
> > > 
> > > From: Weihua Hu 
> > > Sent: Thursday, February 22, 2024 17:29
> > > To: dev@flink.apache.org 
> > > Subject: Re: [ANNOUNCE] New Apache Flink Committer - Jiabao Sun
> > >
> > > Congratulations, Jiabao!
> > >
> > > Best,
> > > Weihua
> > >
> > >
> > > On Thu, Feb 22, 2024 at 10:34 AM Jingsong Li 
> > > wrote:
> > >
> > > > Congratulations! Well deserved!
> > > >
> > > > On Wed, Feb 21, 2024 at 4:36 PM Yuepeng Pan 
> > > wrote:
> > > > >
> > > > > Congratulations~ :)
> > > > >
> > > > > Best,
> > > > > Yuepeng Pan
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > 在 2024-02-21 09:52:17,"Hongshun Wang" 
> 写道:
> > > > > >Congratulations, Jiabao :)
> > > > > >Congratulations Jiabao!
> > > > > >
> > > > > >Best,
> > > > > >Hongshun
> > > > > >Best regards,
> > > > > >
> > > > > >Weijie
> > > > > >
> > > > > >On Tue, Feb 20, 2024 at 2:19 PM Runkang He 
> > wrote:
> > > > > >
> > > > > >> Congratulations Jiabao!
> > > > > >>
> > > > > >> Best,
> > > > > >> Runkang He
> > > > > >>
> > > > > >> Jane Chan  于2024年2月20日周二 14:18写道:
> > > > > >>
> > > > > >> > Congrats, Jiabao!
> > > > > >> >
> > > > > >> > Best,
> > > > > >> > Jane
> > > > > >> >
> > > > > >> > On Tue, Feb 20, 2024 at 10:32 AM Paul Lam <
> > paullin3...@gmail.com>
> > > > wrote:
> > > > > >> >
> > > > > >> > > Congrats, Jiabao!
> > > > > >> > >
> > > > > >> > > Best,
> > > > > >> > > Paul Lam
> > > > > >> > >
> > > > > >> > > > 2024年2月20日 10:29,Zakelly Lan  写道:
> > > > > >> > > >
> > > > > >> > > >> Congrats! Jiabao!
> > > > > >> > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > >
> > >
> >
>


Re: [ANNOUNCE] New Apache Flink PMC Member - Lincoln Lee

2024-04-15 Thread yh z
Congratulations, Lincoln!

Best,
Yunhong (Swuferhong)


Swapnal Varma  于2024年4月15日周一 18:50写道:

> Congratulations, Lincoln!
>
> Best,
> Swapnal
>
>
> On Mon, 15 Apr 2024, 15:16 Jacky Lau,  wrote:
>
> > Congratulations, Lincoln!
> >
> > Best,
> > Jacky Lau
> >
> > Jinzhong Li  于2024年4月15日周一 15:45写道:
> >
> > > Congratulations, Lincoln!
> > >
> > > Best,
> > > Jinzhong Li
> > >
> > > On Mon, Apr 15, 2024 at 2:56 PM Hangxiang Yu 
> > wrote:
> > >
> > > > Congratulations, Lincoln!
> > > >
> > > > On Mon, Apr 15, 2024 at 10:17 AM Zakelly Lan 
> > > > wrote:
> > > >
> > > > > Congratulations, Lincoln!
> > > > >
> > > > >
> > > > > Best,
> > > > > Zakelly
> > > > >
> > > > > On Sat, Apr 13, 2024 at 12:48 AM Ferenc Csaky
> > >  > > > >
> > > > > wrote:
> > > > >
> > > > > > Congratulations, Lincoln!
> > > > > >
> > > > > > Best,
> > > > > > Ferenc
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Friday, April 12th, 2024 at 15:54,
> > lorenzo.affe...@ververica.com
> > > > > .INVALID
> > > > > >  wrote:
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Huge congrats! Well done!
> > > > > > > On Apr 12, 2024 at 13:56 +0200, Ron liu ron9@gmail.com,
> > wrote:
> > > > > > >
> > > > > > > > Congratulations, Lincoln!
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Ron
> > > > > > > >
> > > > > > > > Junrui Lee jrlee@gmail.com 于2024年4月12日周五 18:54写道:
> > > > > > > >
> > > > > > > > > Congratulations, Lincoln!
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Junrui
> > > > > > > > >
> > > > > > > > > Aleksandr Pilipenko z3d...@gmail.com 于2024年4月12日周五
> 18:29写道:
> > > > > > > > >
> > > > > > > > > > > Congratulations, Lincoln!
> > > > > > > > > > >
> > > > > > > > > > > Best Regards
> > > > > > > > > > > Aleksandr
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Best,
> > > > Hangxiang.
> > > >
> > >
> >
>


Re: [ANNOUNCE] New Apache Flink Committer - Zakelly Lan

2024-04-17 Thread yh z
Congratulations Zakelly!

Best regards,
Yunhong (swuferhong)

gongzhongqiang  于2024年4月17日周三 21:26写道:

> Congratulations, Zakelly!
>
>
> Best,
> Zhongqiang Gong
>
> Yuan Mei  于2024年4月15日周一 10:51写道:
>
> > Hi everyone,
> >
> > On behalf of the PMC, I'm happy to let you know that Zakelly Lan has
> become
> > a new Flink Committer!
> >
> > Zakelly has been continuously contributing to the Flink project since
> 2020,
> > with a focus area on Checkpointing, State as well as frocksdb (the
> default
> > on-disk state db).
> >
> > He leads several FLIPs to improve checkpoints and state APIs, including
> > File Merging for Checkpoints and configuration/API reorganizations. He is
> > also one of the main contributors to the recent efforts of "disaggregated
> > state management for Flink 2.0" and drives the entire discussion in the
> > mailing thread, demonstrating outstanding technical depth and breadth of
> > knowledge.
> >
> > Beyond his technical contributions, Zakelly is passionate about helping
> the
> > community in numerous ways. He spent quite some time setting up the Flink
> > Speed Center and rebuilding the benchmark pipeline after the original one
> > was out of lease. He helps build frocksdb and tests for the upcoming
> > frocksdb release (bump rocksdb from 6.20.3->8.10).
> >
> > Please join me in congratulating Zakelly for becoming an Apache Flink
> > committer!
> >
> > Best,
> > Yuan (on behalf of the Flink PMC)
> >
>


Re: [ANNOUNCE] New Apache Flink Committer - Xuyang

2025-02-19 Thread yh z
Congratulations

Best,
Ron

Ron Liu  于2025年2月20日周四 09:56写道:

> Congratulations, you deserve it.
>
> Best,
> Ron
>
>
> Lincoln Lee  于2025年2月20日周四 09:50写道:
>
> > Hi everyone,
> >
> > On behalf of the PMC, I'm happy to announce that Xuyang has become a
> > new Flink Committer!
> >
> > Xuyang has been contributing to the Flink community since Sep 15, 2021,
> he
> > has
> > driven and contributed to 5 FLIPs, submitted over 100 commits and more
> than
> > 100,000 lines of code changes.
> >
> > He primarily focuses on the table-related modules. He has completed the
> > support
> > or SQL Join Hints, advancing the integration of SQL operators with the
> new
> > disaggregated state, and also addressing technical debt, including
> > improving
> > TVF window functionality and performing extensive code cleanup for
> > deprecated
> > APIs in the table module for version 2.0. Additionally, he's very active
> in
> > the mailing
> > list, answering and resolving user issues, and participating in community
> > discussions.
> >
> > Please join me in congratulating Xuyang for becoming an Apache Flink
> > committer.
> >
> >
> > Cheers,
> > Lincoln Lee (On behalf of the Flink PMC)
> >
>