Hi, Lorenzo and Feng

Thanks for joining this thread discussing.

Sorry for later response, regarding your question:

> About the Operations interfaces, how can they be empty?
Should not they provide at least a `run` or `execute` method (similar to
the command pattern)?
In this way, their implementation can wrap all the implementations details
of particular schedulers, and the scheduler can simply execute the command.
In general, I think a simple sequence diagram showcasing the interaction
between the interfaces would be awesome to better understand the concept.

I've updated the FLIP, and added the Outline Design section, to introduce how
Materialized Table interacts with the Workflow Scheduler in Full Refresh
mode via a timing diagram, it can help to understand this proposal design.

> What about the RefreshHandler, I cannot find a definition of its
interface here.
Is it out of scope for this FLIP?

There is some context that is not aligned here, RefreshHandler was proposed
in FLIP-435, you can get more detail from [1].

> If it is periodic, where is the period?
For the scheduleTime and format, why not simply pass an instance of
LocalDateTime or similar? The gateway should not have the responsibility to
parse the time.

This might require a bit of context for clarity. In Full Refresh mode, the
Materialized Table requires the Scheduler to periodically trigger refresh
operations. This means that the Scheduler will periodically call the REST
API, passing parameters such as scheduleTime and scheduleTimeFormat. The
materialized table manager(to be introduced)  relies on this information to
accurately calculate the correct time partitions. At the same time, we also
support manual refreshes of materialized tables, and in the future, we will
support manual cascading refreshes on a multi-table granularity. For cases
of manual cascading refresh, we will also register a one-time refresh
workflow with the Scheduler, which then triggers the execution via the REST
API call. However, during a manual refresh, users typically specify
partition information, and there's no need for the engine to deduce it,
thus scheduler time is not needed.

Taking the above into account, there are two types of refresh workflows:
periodic workflows and one-time workflows. The engine requires different
information for each type of workflow. When designing the REST API, we aim
for this API to support both types of workflows simultaneously. Hence we
introduce the isPeriodic parameter for differentiation. Then the engine
will know what to do accordingly.

The scheduleTime and scheduleTimeFormat are passed from Scheduler to the
Gateway via the REST API. Firstly, in the HTTP protocol, there is no type
equivalent to Java's LocalDateTime. Secondly, Schedulers can potentially be
written in different programming languages; for example, Airflow uses
Python to develop its workflows. Hence, it's obvious that we cannot limit
the Scheduler to the use of Java LocalDateTime type. Therefore, a String
type is the most suitable. Lastly, the purpose of the schedulerTime is to
determine the time partitioning details of the partitioned table. This
parsing responsibility falls upon the materialized table manager and not
the SqlGateway, which is solely responsible for passthrough parameters.

You may refer to the Outline Design section of this FLIP, specifically the
Partitioned Table Full Refresh part in FLIP-435, to further comprehend the
overall design principles.

> For the REST API:
wouldn't it be better (more REST) to move the `mt_identifier` to the URL?
E.g.: v3/materialized_tables/<mt_identifier>/refresh

I think this is a good idea. I have another consideration though, does this
API support passing multiple materialized tables at the same time, if it
does, it will have to be put in the request body. I will discuss the design
of this API with ShengKai Fang offline, he is the owner of the Gateway
module. Anyway, your proposal is a good choice.


>  From my current understanding, the workflow handle should not be bound
to the Dynamic Table. Therefore, if the workflow is modified, does it mean
that the scheduling information corresponding to the Dynamic Table will be
lost?

You can see the FLIP Outline Design section to understand the overall
design further. The refresh handler is just a pointer that can locate the
workflow info in the scheduler,  so scheduling info will be persistent to
the Scheduler, it will not lost.

>  Regarding the status information of the workflow, I am wondering if it
is necessary to provide an interface to display the backend scheduling
information? This would make it more convenient to view the execution
status of backend jobs.

The RefreshHandler#asSummaryString will return the summary information of
the background refresh job, you can get it via DESC TABLE xxx. I think you
want to get detail information about background jobs, you should go to
Scheduler, it provides the most detailed information. Even if the interface
is provided, we don't get the complete information and how does this
interface show the information about the background? So I don't think it is
necessary.


[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-435%3A+Introduce+a+New+Materialized+Table+for+Simplifying+Data+Pipelines

Best,
Ron



Feng Jin <jinfeng1...@gmail.com> 于2024年4月25日周四 00:46写道:

> Hi Ron
>
> Thank you for initiating this FLIP.
>
> My current questions are as follows:
>
> 1. From my current understanding, the workflow handle should not be bound
> to the Dynamic Table. Therefore, if the workflow is modified, does it mean
> that the scheduling information corresponding to the Dynamic Table will be
> lost?
>
> 2. Regarding the status information of the workflow, I am wondering if it
> is necessary to provide an interface to display the backend scheduling
> information? This would make it more convenient to view the execution
> status of backend jobs.
>
>
> Best,
> Feng
>
>
> On Wed, Apr 24, 2024 at 3:24 PM <lorenzo.affe...@ververica.com.invalid>
> wrote:
>
> > Hello Ron Liu! Thank you for your FLIP!
> >
> > Here are my considerations:
> >
> > 1.
> > About the Operations interfaces, how can they be empty?
> > Should not they provide at least a `run` or `execute` method (similar to
> > the command pattern)?
> > In this way, their implementation can wrap all the implementations
> details
> > of particular schedulers, and the scheduler can simply execute the
> command.
> > In general, I think a simple sequence diagram showcasing the interaction
> > between the interfaces would be awesome to better understand the concept.
> >
> > 2.
> > What about the RefreshHandler, I cannot find a definition of its
> interface
> > here.
> > Is it out of scope for this FLIP?
> >
> > 3.
> > For the SqlGatewayService arguments:
> >
> >             boolean isPeriodic,
> >             @Nullable String scheduleTime,
> >             @Nullable String scheduleTimeFormat,
> >
> > If it is periodic, where is the period?
> > For the scheduleTime and format, why not simply pass an instance of
> > LocalDateTime or similar? The gateway should not have the responsibility
> to
> > parse the time.
> >
> > 4.
> > For the REST API:
> > wouldn't it be better (more REST) to move the `mt_identifier` to the URL?
> > E.g.: v3/materialized_tables/<mt_identifier>/refresh
> >
> > Thank you!
> > On Apr 22, 2024 at 08:42 +0200, Ron Liu <ron9....@gmail.com>, wrote:
> > > Hi, Dev
> > >
> > > I would like to start a discussion about FLIP-448: Introduce Pluggable
> > > Workflow Scheduler Interface for Materialized Table.
> > >
> > > In FLIP-435[1], we proposed Materialized Table, which has two types of
> > data
> > > refresh modes: Full Refresh & Continuous Refresh Mode. In Full Refresh
> > > mode, the Materialized Table relies on a workflow scheduler to perform
> > > periodic refresh operation to achieve the desired data freshness.
> > >
> > > There are numerous open-source workflow schedulers available, with
> > popular
> > > ones including Airflow and DolphinScheduler. To enable Materialized
> Table
> > > to work with different workflow schedulers, we propose a pluggable
> > workflow
> > > scheduler interface for Materialized Table in this FLIP.
> > >
> > > For more details, see FLIP-448 [2]. Looking forward to your feedback.
> > >
> > > [1] https://lists.apache.org/thread/c1gnn3bvbfs8v1trlf975t327s4rsffs
> > > [2]
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-448%3A+Introduce+Pluggable+Workflow+Scheduler+Interface+for+Materialized+Table
> > >
> > > Best,
> > > Ron
> >
>

Reply via email to