Hi, Lorenzo and Feng Thanks for joining this thread discussing.
Sorry for later response, regarding your question: > About the Operations interfaces, how can they be empty? Should not they provide at least a `run` or `execute` method (similar to the command pattern)? In this way, their implementation can wrap all the implementations details of particular schedulers, and the scheduler can simply execute the command. In general, I think a simple sequence diagram showcasing the interaction between the interfaces would be awesome to better understand the concept. I've updated the FLIP, and added the Outline Design section, to introduce how Materialized Table interacts with the Workflow Scheduler in Full Refresh mode via a timing diagram, it can help to understand this proposal design. > What about the RefreshHandler, I cannot find a definition of its interface here. Is it out of scope for this FLIP? There is some context that is not aligned here, RefreshHandler was proposed in FLIP-435, you can get more detail from [1]. > If it is periodic, where is the period? For the scheduleTime and format, why not simply pass an instance of LocalDateTime or similar? The gateway should not have the responsibility to parse the time. This might require a bit of context for clarity. In Full Refresh mode, the Materialized Table requires the Scheduler to periodically trigger refresh operations. This means that the Scheduler will periodically call the REST API, passing parameters such as scheduleTime and scheduleTimeFormat. The materialized table manager(to be introduced) relies on this information to accurately calculate the correct time partitions. At the same time, we also support manual refreshes of materialized tables, and in the future, we will support manual cascading refreshes on a multi-table granularity. For cases of manual cascading refresh, we will also register a one-time refresh workflow with the Scheduler, which then triggers the execution via the REST API call. However, during a manual refresh, users typically specify partition information, and there's no need for the engine to deduce it, thus scheduler time is not needed. Taking the above into account, there are two types of refresh workflows: periodic workflows and one-time workflows. The engine requires different information for each type of workflow. When designing the REST API, we aim for this API to support both types of workflows simultaneously. Hence we introduce the isPeriodic parameter for differentiation. Then the engine will know what to do accordingly. The scheduleTime and scheduleTimeFormat are passed from Scheduler to the Gateway via the REST API. Firstly, in the HTTP protocol, there is no type equivalent to Java's LocalDateTime. Secondly, Schedulers can potentially be written in different programming languages; for example, Airflow uses Python to develop its workflows. Hence, it's obvious that we cannot limit the Scheduler to the use of Java LocalDateTime type. Therefore, a String type is the most suitable. Lastly, the purpose of the schedulerTime is to determine the time partitioning details of the partitioned table. This parsing responsibility falls upon the materialized table manager and not the SqlGateway, which is solely responsible for passthrough parameters. You may refer to the Outline Design section of this FLIP, specifically the Partitioned Table Full Refresh part in FLIP-435, to further comprehend the overall design principles. > For the REST API: wouldn't it be better (more REST) to move the `mt_identifier` to the URL? E.g.: v3/materialized_tables/<mt_identifier>/refresh I think this is a good idea. I have another consideration though, does this API support passing multiple materialized tables at the same time, if it does, it will have to be put in the request body. I will discuss the design of this API with ShengKai Fang offline, he is the owner of the Gateway module. Anyway, your proposal is a good choice. > From my current understanding, the workflow handle should not be bound to the Dynamic Table. Therefore, if the workflow is modified, does it mean that the scheduling information corresponding to the Dynamic Table will be lost? You can see the FLIP Outline Design section to understand the overall design further. The refresh handler is just a pointer that can locate the workflow info in the scheduler, so scheduling info will be persistent to the Scheduler, it will not lost. > Regarding the status information of the workflow, I am wondering if it is necessary to provide an interface to display the backend scheduling information? This would make it more convenient to view the execution status of backend jobs. The RefreshHandler#asSummaryString will return the summary information of the background refresh job, you can get it via DESC TABLE xxx. I think you want to get detail information about background jobs, you should go to Scheduler, it provides the most detailed information. Even if the interface is provided, we don't get the complete information and how does this interface show the information about the background? So I don't think it is necessary. [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-435%3A+Introduce+a+New+Materialized+Table+for+Simplifying+Data+Pipelines Best, Ron Feng Jin <jinfeng1...@gmail.com> 于2024年4月25日周四 00:46写道: > Hi Ron > > Thank you for initiating this FLIP. > > My current questions are as follows: > > 1. From my current understanding, the workflow handle should not be bound > to the Dynamic Table. Therefore, if the workflow is modified, does it mean > that the scheduling information corresponding to the Dynamic Table will be > lost? > > 2. Regarding the status information of the workflow, I am wondering if it > is necessary to provide an interface to display the backend scheduling > information? This would make it more convenient to view the execution > status of backend jobs. > > > Best, > Feng > > > On Wed, Apr 24, 2024 at 3:24 PM <lorenzo.affe...@ververica.com.invalid> > wrote: > > > Hello Ron Liu! Thank you for your FLIP! > > > > Here are my considerations: > > > > 1. > > About the Operations interfaces, how can they be empty? > > Should not they provide at least a `run` or `execute` method (similar to > > the command pattern)? > > In this way, their implementation can wrap all the implementations > details > > of particular schedulers, and the scheduler can simply execute the > command. > > In general, I think a simple sequence diagram showcasing the interaction > > between the interfaces would be awesome to better understand the concept. > > > > 2. > > What about the RefreshHandler, I cannot find a definition of its > interface > > here. > > Is it out of scope for this FLIP? > > > > 3. > > For the SqlGatewayService arguments: > > > > boolean isPeriodic, > > @Nullable String scheduleTime, > > @Nullable String scheduleTimeFormat, > > > > If it is periodic, where is the period? > > For the scheduleTime and format, why not simply pass an instance of > > LocalDateTime or similar? The gateway should not have the responsibility > to > > parse the time. > > > > 4. > > For the REST API: > > wouldn't it be better (more REST) to move the `mt_identifier` to the URL? > > E.g.: v3/materialized_tables/<mt_identifier>/refresh > > > > Thank you! > > On Apr 22, 2024 at 08:42 +0200, Ron Liu <ron9....@gmail.com>, wrote: > > > Hi, Dev > > > > > > I would like to start a discussion about FLIP-448: Introduce Pluggable > > > Workflow Scheduler Interface for Materialized Table. > > > > > > In FLIP-435[1], we proposed Materialized Table, which has two types of > > data > > > refresh modes: Full Refresh & Continuous Refresh Mode. In Full Refresh > > > mode, the Materialized Table relies on a workflow scheduler to perform > > > periodic refresh operation to achieve the desired data freshness. > > > > > > There are numerous open-source workflow schedulers available, with > > popular > > > ones including Airflow and DolphinScheduler. To enable Materialized > Table > > > to work with different workflow schedulers, we propose a pluggable > > workflow > > > scheduler interface for Materialized Table in this FLIP. > > > > > > For more details, see FLIP-448 [2]. Looking forward to your feedback. > > > > > > [1] https://lists.apache.org/thread/c1gnn3bvbfs8v1trlf975t327s4rsffs > > > [2] > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-448%3A+Introduce+Pluggable+Workflow+Scheduler+Interface+for+Materialized+Table > > > > > > Best, > > > Ron > > >