Hi dev

Thanks for all the feedback, it seems that here are no more comments, I will
start a vote on FLIP-315 [1] later. Thanks again.

[1]:
https://cwiki.apache.org/confluence/display/FLINK/FLIP-315+Support+Operator+Fusion+Codegen+for+Flink+SQL

Best,
Ron

liu ron <ron9....@gmail.com> 于2023年6月5日周一 16:01写道:

> Hi, Yun, Jinsong, Benchao
>
> Thanks for your valuable input about this FLIP.
>
> First of all, let me emphasize that from the technical implementation
> point of view, this design is feasible in both stream and batch scenarios,
> so I consider both stream and batch mode in FLIP. In the stream scenario,
> for stateful operator, according to our business experience, basically the
> bottleneck is on the state access, so the optimization effect of OFCG for
> the stream will not be particularly obvious, so we will not give priority
> to support it currently. On the contrary, in the batch scenario, where CPU
> is the bottleneck, this optimization is gainful.
>
> Taking the above into account, we are able to support both stream and
> batch mode optimization in this design, but we will give priority to
> supporting batch operators. As benchao said, when we find a suitable
> streaming business scenario in the future, we can consider doing this
> optimization. Back to Yun issue, the design will break state compatibility
> in stream mode as[1] and the version upgrade will not support this OFCG. As
> mentioned earlier, we will not support this feature in stream mode in the
> short term.
>
> Also thanks to Benchao's suggestion, I will state the current goal of that
> optimization in the FLIP, scoped to batch mode.
>
> Best,
> Ron
>
> liu ron <ron9....@gmail.com> 于2023年6月5日周一 15:04写道:
>
>> Hi, Lincoln
>>
>> Thanks for your appreciation of this design. Regarding your question:
>>
>> > do we consider adding a benchmark for the operators to intuitively
>> understand the improvement brought by each improvement?
>>
>> I think it makes sense to add a benchmark, Spark also has this benchmark
>> framework. But I think it is another story to introduce a benchmark
>> framework in Flink, we need to start a new discussion to this work.
>>
>> > for the implementation plan, mentioned in the FLIP that 1.18 will
>> support Calc, HashJoin and HashAgg, then what will be the next step? and
>> which operators do we ultimately expect to cover (all or specific ones)?
>>
>> Our ultimate goal is to support all operators in batch mode, but we
>> prioritize them according to their usage. Operators like Calc, HashJoin,
>> HashAgg, etc. are more commonly used, so we will support them first. Later
>> we support the rest of the operators step by step. Considering the time
>> factor and the development workload, so we can only support  Calc,
>> HashJoin, HashAgg in 1.18. In 1.19 or 1.20, we will complete the rest work.
>> I will make this clear in FLIP
>>
>> Best,
>> Ron
>>
>> Jingsong Li <jingsongl...@gmail.com> 于2023年6月5日周一 14:15写道:
>>
>>> > For the state compatibility session, it seems that the checkpoint
>>> compatibility would be broken just like [1] did. Could FLIP-190 [2] still
>>> be helpful in this case for SQL version upgrades?
>>>
>>> I guess this is only for batch processing. Streaming should be another
>>> story?
>>>
>>> Best,
>>> Jingsong
>>>
>>> On Mon, Jun 5, 2023 at 2:07 PM Yun Tang <myas...@live.com> wrote:
>>> >
>>> > Hi Ron,
>>> >
>>> > I think this FLIP would help to improve the performance, looking
>>> forward to its completion in Flink!
>>> >
>>> > For the state compatibility session, it seems that the checkpoint
>>> compatibility would be broken just like [1] did. Could FLIP-190 [2] still
>>> be helpful in this case for SQL version upgrades?
>>> >
>>> >
>>> > [1]
>>> https://docs.google.com/document/d/1qKVohV12qn-bM51cBZ8Hcgp31ntwClxjoiNBUOqVHsI/edit#heading=h.fri5rtpte0si
>>> > [2]
>>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=191336489
>>> >
>>> > Best
>>> > Yun Tang
>>> >
>>> > ________________________________
>>> > From: Lincoln Lee <lincoln.8...@gmail.com>
>>> > Sent: Monday, June 5, 2023 10:56
>>> > To: dev@flink.apache.org <dev@flink.apache.org>
>>> > Subject: Re: [DISCUSS] FLIP-315: Support Operator Fusion Codegen for
>>> Flink SQL
>>> >
>>> > Hi Ron
>>> >
>>> > OFGC looks like an exciting optimization, looking forward to its
>>> completion
>>> > in Flink!
>>> > A small question, do we consider adding a benchmark for the operators
>>> to
>>> > intuitively understand the improvement brought by each improvement?
>>> > In addition, for the implementation plan, mentioned in the FLIP that
>>> 1.18
>>> > will support Calc, HashJoin and HashAgg, then what will be the next
>>> step?
>>> > and which operators do we ultimately expect to cover (all or specific
>>> ones)?
>>> >
>>> > Best,
>>> > Lincoln Lee
>>> >
>>> >
>>> > liu ron <ron9....@gmail.com> 于2023年6月5日周一 09:40写道:
>>> >
>>> > > Hi, Jark
>>> > >
>>> > > Thanks for your feedback, according to my initial assessment, the
>>> work
>>> > > effort is relatively large.
>>> > >
>>> > > Moreover, I will add a test result of all queries to the FLIP.
>>> > >
>>> > > Best,
>>> > > Ron
>>> > >
>>> > > Jark Wu <imj...@gmail.com> 于2023年6月1日周四 20:45写道:
>>> > >
>>> > > > Hi Ron,
>>> > > >
>>> > > > Thanks a lot for the great proposal. The FLIP looks good to me in
>>> > > general.
>>> > > > It looks like not an easy work but the performance sounds
>>> promising. So I
>>> > > > think it's worth doing.
>>> > > >
>>> > > > Besides, if there is a complete test graph with all TPC-DS
>>> queries, the
>>> > > > effect of this FLIP will be more intuitive.
>>> > > >
>>> > > > Best,
>>> > > > Jark
>>> > > >
>>> > > >
>>> > > >
>>> > > > On Wed, 31 May 2023 at 14:27, liu ron <ron9....@gmail.com> wrote:
>>> > > >
>>> > > > > Hi, Jinsong
>>> > > > >
>>> > > > > Thanks for your valuable suggestions.
>>> > > > >
>>> > > > > Best,
>>> > > > > Ron
>>> > > > >
>>> > > > > Jingsong Li <jingsongl...@gmail.com> 于2023年5月30日周二 13:22写道:
>>> > > > >
>>> > > > > > Thanks Ron for your information.
>>> > > > > >
>>> > > > > > I suggest that it can be written in the Motivation of FLIP.
>>> > > > > >
>>> > > > > > Best,
>>> > > > > > Jingsong
>>> > > > > >
>>> > > > > > On Tue, May 30, 2023 at 9:57 AM liu ron <ron9....@gmail.com>
>>> wrote:
>>> > > > > > >
>>> > > > > > > Hi, Jingsong
>>> > > > > > >
>>> > > > > > > Thanks for your review. We have tested it in TPC-DS case,
>>> and got a
>>> > > > 12%
>>> > > > > > > gain overall when only supporting only Calc&HashJoin&HashAgg
>>> > > > operator.
>>> > > > > In
>>> > > > > > > some queries, we even get more than 30% gain, it looks like
>>> an
>>> > > > > effective
>>> > > > > > > way.
>>> > > > > > >
>>> > > > > > > Best,
>>> > > > > > > Ron
>>> > > > > > >
>>> > > > > > > Jingsong Li <jingsongl...@gmail.com> 于2023年5月29日周一 14:33写道:
>>> > > > > > >
>>> > > > > > > > Thanks Ron for the proposal.
>>> > > > > > > >
>>> > > > > > > > Do you have some benchmark results for the performance
>>> > > > improvement? I
>>> > > > > > > > am more concerned about the improvement on Flink than the
>>> data in
>>> > > > > > > > other papers.
>>> > > > > > > >
>>> > > > > > > > Best,
>>> > > > > > > > Jingsong
>>> > > > > > > >
>>> > > > > > > > On Mon, May 29, 2023 at 2:16 PM liu ron <
>>> ron9....@gmail.com>
>>> > > > wrote:
>>> > > > > > > > >
>>> > > > > > > > > Hi, dev
>>> > > > > > > > >
>>> > > > > > > > > I'd like to start a discussion about FLIP-315: Support
>>> Operator
>>> > > > > > Fusion
>>> > > > > > > > > Codegen for Flink SQL[1]
>>> > > > > > > > >
>>> > > > > > > > > As main memory grows, query performance is more and more
>>> > > > determined
>>> > > > > > by
>>> > > > > > > > the
>>> > > > > > > > > raw CPU costs of query processing itself, this is due to
>>> the
>>> > > > query
>>> > > > > > > > > processing techniques based on interpreted execution
>>> shows poor
>>> > > > > > > > performance
>>> > > > > > > > > on modern CPUs due to lack of locality and frequent
>>> instruction
>>> > > > > > > > > mis-prediction. Therefore, the industry is also
>>> researching how
>>> > > > to
>>> > > > > > > > improve
>>> > > > > > > > > engine performance by increasing operator execution
>>> efficiency.
>>> > > > In
>>> > > > > > > > > addition, during the process of optimizing Flink's
>>> performance
>>> > > > for
>>> > > > > > TPC-DS
>>> > > > > > > > > queries, we found that a significant amount of CPU time
>>> was
>>> > > spent
>>> > > > > on
>>> > > > > > > > > virtual function calls, framework collector calls, and
>>> invalid
>>> > > > > > > > > calculations, which can be optimized to improve the
>>> overall
>>> > > > engine
>>> > > > > > > > > performance. After some investigation, we found Operator
>>> Fusion
>>> > > > > > Codegen
>>> > > > > > > > > which is proposed by Thomas Neumann in the paper[2] can
>>> address
>>> > > > > these
>>> > > > > > > > > problems. I have finished a PoC[3] to verify its
>>> feasibility
>>> > > and
>>> > > > > > > > validity.
>>> > > > > > > > >
>>> > > > > > > > > Looking forward to your feedback.
>>> > > > > > > > >
>>> > > > > > > > > [1]:
>>> > > > > > > > >
>>> > > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-315+Support+Operator+Fusion+Codegen+for+Flink+SQL
>>> > > > > > > > > [2]: http://www.vldb.org/pvldb/vol4/p539-neumann.pdf
>>> > > > > > > > > [3]: https://github.com/lsyldliu/flink/tree/OFCG
>>> > > > > > > > >
>>> > > > > > > > > Best,
>>> > > > > > > > > Ron
>>> > > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>>
>>

Reply via email to