Hi, Yun, Jinsong, Benchao

Thanks for your valuable input about this FLIP.

First of all, let me emphasize that from the technical implementation point
of view, this design is feasible in both stream and batch scenarios, so I
consider both stream and batch mode in FLIP. In the stream scenario, for
stateful operator, according to our business experience, basically the
bottleneck is on the state access, so the optimization effect of OFCG for
the stream will not be particularly obvious, so we will not give priority
to support it currently. On the contrary, in the batch scenario, where CPU
is the bottleneck, this optimization is gainful.

Taking the above into account, we are able to support both stream and batch
mode optimization in this design, but we will give priority to supporting
batch operators. As benchao said, when we find a suitable streaming
business scenario in the future, we can consider doing this optimization.
Back to Yun issue, the design will break state compatibility in stream mode
as[1] and the version upgrade will not support this OFCG. As mentioned
earlier, we will not support this feature in stream mode in the short term.

Also thanks to Benchao's suggestion, I will state the current goal of that
optimization in the FLIP, scoped to batch mode.

Best,
Ron

liu ron <ron9....@gmail.com> 于2023年6月5日周一 15:04写道:

> Hi, Lincoln
>
> Thanks for your appreciation of this design. Regarding your question:
>
> > do we consider adding a benchmark for the operators to intuitively
> understand the improvement brought by each improvement?
>
> I think it makes sense to add a benchmark, Spark also has this benchmark
> framework. But I think it is another story to introduce a benchmark
> framework in Flink, we need to start a new discussion to this work.
>
> > for the implementation plan, mentioned in the FLIP that 1.18 will
> support Calc, HashJoin and HashAgg, then what will be the next step? and
> which operators do we ultimately expect to cover (all or specific ones)?
>
> Our ultimate goal is to support all operators in batch mode, but we
> prioritize them according to their usage. Operators like Calc, HashJoin,
> HashAgg, etc. are more commonly used, so we will support them first. Later
> we support the rest of the operators step by step. Considering the time
> factor and the development workload, so we can only support  Calc,
> HashJoin, HashAgg in 1.18. In 1.19 or 1.20, we will complete the rest work.
> I will make this clear in FLIP
>
> Best,
> Ron
>
> Jingsong Li <jingsongl...@gmail.com> 于2023年6月5日周一 14:15写道:
>
>> > For the state compatibility session, it seems that the checkpoint
>> compatibility would be broken just like [1] did. Could FLIP-190 [2] still
>> be helpful in this case for SQL version upgrades?
>>
>> I guess this is only for batch processing. Streaming should be another
>> story?
>>
>> Best,
>> Jingsong
>>
>> On Mon, Jun 5, 2023 at 2:07 PM Yun Tang <myas...@live.com> wrote:
>> >
>> > Hi Ron,
>> >
>> > I think this FLIP would help to improve the performance, looking
>> forward to its completion in Flink!
>> >
>> > For the state compatibility session, it seems that the checkpoint
>> compatibility would be broken just like [1] did. Could FLIP-190 [2] still
>> be helpful in this case for SQL version upgrades?
>> >
>> >
>> > [1]
>> https://docs.google.com/document/d/1qKVohV12qn-bM51cBZ8Hcgp31ntwClxjoiNBUOqVHsI/edit#heading=h.fri5rtpte0si
>> > [2]
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=191336489
>> >
>> > Best
>> > Yun Tang
>> >
>> > ________________________________
>> > From: Lincoln Lee <lincoln.8...@gmail.com>
>> > Sent: Monday, June 5, 2023 10:56
>> > To: dev@flink.apache.org <dev@flink.apache.org>
>> > Subject: Re: [DISCUSS] FLIP-315: Support Operator Fusion Codegen for
>> Flink SQL
>> >
>> > Hi Ron
>> >
>> > OFGC looks like an exciting optimization, looking forward to its
>> completion
>> > in Flink!
>> > A small question, do we consider adding a benchmark for the operators to
>> > intuitively understand the improvement brought by each improvement?
>> > In addition, for the implementation plan, mentioned in the FLIP that
>> 1.18
>> > will support Calc, HashJoin and HashAgg, then what will be the next
>> step?
>> > and which operators do we ultimately expect to cover (all or specific
>> ones)?
>> >
>> > Best,
>> > Lincoln Lee
>> >
>> >
>> > liu ron <ron9....@gmail.com> 于2023年6月5日周一 09:40写道:
>> >
>> > > Hi, Jark
>> > >
>> > > Thanks for your feedback, according to my initial assessment, the work
>> > > effort is relatively large.
>> > >
>> > > Moreover, I will add a test result of all queries to the FLIP.
>> > >
>> > > Best,
>> > > Ron
>> > >
>> > > Jark Wu <imj...@gmail.com> 于2023年6月1日周四 20:45写道:
>> > >
>> > > > Hi Ron,
>> > > >
>> > > > Thanks a lot for the great proposal. The FLIP looks good to me in
>> > > general.
>> > > > It looks like not an easy work but the performance sounds
>> promising. So I
>> > > > think it's worth doing.
>> > > >
>> > > > Besides, if there is a complete test graph with all TPC-DS queries,
>> the
>> > > > effect of this FLIP will be more intuitive.
>> > > >
>> > > > Best,
>> > > > Jark
>> > > >
>> > > >
>> > > >
>> > > > On Wed, 31 May 2023 at 14:27, liu ron <ron9....@gmail.com> wrote:
>> > > >
>> > > > > Hi, Jinsong
>> > > > >
>> > > > > Thanks for your valuable suggestions.
>> > > > >
>> > > > > Best,
>> > > > > Ron
>> > > > >
>> > > > > Jingsong Li <jingsongl...@gmail.com> 于2023年5月30日周二 13:22写道:
>> > > > >
>> > > > > > Thanks Ron for your information.
>> > > > > >
>> > > > > > I suggest that it can be written in the Motivation of FLIP.
>> > > > > >
>> > > > > > Best,
>> > > > > > Jingsong
>> > > > > >
>> > > > > > On Tue, May 30, 2023 at 9:57 AM liu ron <ron9....@gmail.com>
>> wrote:
>> > > > > > >
>> > > > > > > Hi, Jingsong
>> > > > > > >
>> > > > > > > Thanks for your review. We have tested it in TPC-DS case, and
>> got a
>> > > > 12%
>> > > > > > > gain overall when only supporting only Calc&HashJoin&HashAgg
>> > > > operator.
>> > > > > In
>> > > > > > > some queries, we even get more than 30% gain, it looks like
>> an
>> > > > > effective
>> > > > > > > way.
>> > > > > > >
>> > > > > > > Best,
>> > > > > > > Ron
>> > > > > > >
>> > > > > > > Jingsong Li <jingsongl...@gmail.com> 于2023年5月29日周一 14:33写道:
>> > > > > > >
>> > > > > > > > Thanks Ron for the proposal.
>> > > > > > > >
>> > > > > > > > Do you have some benchmark results for the performance
>> > > > improvement? I
>> > > > > > > > am more concerned about the improvement on Flink than the
>> data in
>> > > > > > > > other papers.
>> > > > > > > >
>> > > > > > > > Best,
>> > > > > > > > Jingsong
>> > > > > > > >
>> > > > > > > > On Mon, May 29, 2023 at 2:16 PM liu ron <ron9....@gmail.com
>> >
>> > > > wrote:
>> > > > > > > > >
>> > > > > > > > > Hi, dev
>> > > > > > > > >
>> > > > > > > > > I'd like to start a discussion about FLIP-315: Support
>> Operator
>> > > > > > Fusion
>> > > > > > > > > Codegen for Flink SQL[1]
>> > > > > > > > >
>> > > > > > > > > As main memory grows, query performance is more and more
>> > > > determined
>> > > > > > by
>> > > > > > > > the
>> > > > > > > > > raw CPU costs of query processing itself, this is due to
>> the
>> > > > query
>> > > > > > > > > processing techniques based on interpreted execution
>> shows poor
>> > > > > > > > performance
>> > > > > > > > > on modern CPUs due to lack of locality and frequent
>> instruction
>> > > > > > > > > mis-prediction. Therefore, the industry is also
>> researching how
>> > > > to
>> > > > > > > > improve
>> > > > > > > > > engine performance by increasing operator execution
>> efficiency.
>> > > > In
>> > > > > > > > > addition, during the process of optimizing Flink's
>> performance
>> > > > for
>> > > > > > TPC-DS
>> > > > > > > > > queries, we found that a significant amount of CPU time
>> was
>> > > spent
>> > > > > on
>> > > > > > > > > virtual function calls, framework collector calls, and
>> invalid
>> > > > > > > > > calculations, which can be optimized to improve the
>> overall
>> > > > engine
>> > > > > > > > > performance. After some investigation, we found Operator
>> Fusion
>> > > > > > Codegen
>> > > > > > > > > which is proposed by Thomas Neumann in the paper[2] can
>> address
>> > > > > these
>> > > > > > > > > problems. I have finished a PoC[3] to verify its
>> feasibility
>> > > and
>> > > > > > > > validity.
>> > > > > > > > >
>> > > > > > > > > Looking forward to your feedback.
>> > > > > > > > >
>> > > > > > > > > [1]:
>> > > > > > > > >
>> > > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-315+Support+Operator+Fusion+Codegen+for+Flink+SQL
>> > > > > > > > > [2]: http://www.vldb.org/pvldb/vol4/p539-neumann.pdf
>> > > > > > > > > [3]: https://github.com/lsyldliu/flink/tree/OFCG
>> > > > > > > > >
>> > > > > > > > > Best,
>> > > > > > > > > Ron
>> > > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>>
>

Reply via email to