Re: [DISCUSS] FLIP-419: Optimize multi-sink query plan generation

Jim Hughes Tue, 16 Jan 2024 17:36:45 -0800

Hi Jeyhun,

Generally, I like the idea of speeding up the optimizer in the case of
multiple queries!

I am new to the optimizer, but I have a few comments / questions.

   1. StreamOptimizeContext may still be needed to pass the fact that we
   are optimizing a streaming query.  I don't think this class will go away
   completely.  (I agree it may become more simple if the kind or
   mini-batch configuration can be removed.)
   2. How are the mini-batch and changelog inference rules tightly coupled?
   I looked a little bit and I haven't seen any connection between them.  It
   seems like the changelog inference is what needs to run multiple times.
   3. I think your point about code complexity is unnecessary.
StreamOptimizeContext
   extends org.apache.calcite.plan.Context which is used an interface to pass
   information and objects through the Calcite stack.
   4. Is an alternative where the complexity of the changelog optimization
   can be moved into the `FlinkChangelogModeInferenceProgram`?  (If this is
   coupling between the mini-batch and changelog rules, then this would not
   make sense.)
   5. There are some other smaller refactorings.  I tried some of them
   here: https://github.com/apache/flink/pull/24108 Mostly, it is syntax
   and using lazy vals to avoid recomputing various things.  (Feel free to
   take whatever actually works; I haven't run the tests.)

Separately, folks on the Calcite dev list are thinking about multi-query
optimization:
https://lists.apache.org/thread/mcdqwrtpx0os54t2nn9vtk17spkp5o5k
https://issues.apache.org/jira/browse/CALCITE-6188

Cheers,

Jim

On Tue, Jan 16, 2024 at 5:45 PM Jeyhun Karimov <je.kari...@gmail.com> wrote:

> Hi devs,
>
> I’d like to start a discussion on FLIP-419: Optimize multi-sink query plan
> generation [1].
>
>
> Currently, the optimization process of multi-sink query plans are
> suboptimal: 1) it requires to go through the optimization process several
> times and 2) as a result of this some low-level code complexity is
> introduced on high level optimization classes such
> as StreamCommonSubGraphBasedOptimizer.
>
>
> To address this issue, this FLIP introduces  to decouple changelog and
> mini-batch interval inference from the main optimization process.
>
> Please find more details in the FLIP wiki document [1]. Looking forward to
> your feedback.
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-419%3A+Optimize+multi-sink+query+plan+generation
>
>
> Regards,
> Jeyhun Karimov
>

Re: [DISCUSS] FLIP-419: Optimize multi-sink query plan generation

Reply via email to