Hello Ron! > I've updated the FLIP, and added the Outline Design section, to introduce how Materialized Table interacts with the Workflow Scheduler in Full Refresh mode via a timing diagram, it can help to understand this proposal design.
Thank you for the additions, the sequence diagram says a lot. I have a question there: how can the gateway update the refreshHandler in the Catalog before getting it from the scheduler? Just a nit, in the FLIP: > WorkflowOperation implementation class is provided by the engine to the > WorkflowScheudler. Currently, its implementation class would include > CreatePeriodicWorkflowOperation, SuspendWorkflowOperation, > ResumeWorkflowOperation, and ModifyWorkflowCronOperation. You have a typo here: WorkflowScheudler -> WorkflowScheduler :) For the operations part, I still think that the FLIP would benefit from providing a specific pattern for operations. You could either propose a command pattern [1] or a visitor pattern (where the scheduler visits the operation to get relevant info) [2] for those operations at your choice. > This means that the Scheduler will periodically call the REST API, passing parameters such as scheduleTime and scheduleTimeFormat About "isPeriodic" and the date type, I got your point, thank you for the context! About the REST API, I will wait for your offline discussion :) [1] https://en.wikipedia.org/wiki/Command_pattern [2] https://en.wikipedia.org/wiki/Visitor_pattern On Apr 25, 2024 at 13:22 +0200, Ron Liu <ron9....@gmail.com>, wrote: > Hi, Lorenzo and Feng > > Thanks for joining this thread discussing. > > Sorry for later response, regarding your question: > > > About the Operations interfaces, how can they be empty? > Should not they provide at least a `run` or `execute` method (similar to > the command pattern)? > In this way, their implementation can wrap all the implementations details > of particular schedulers, and the scheduler can simply execute the command. > In general, I think a simple sequence diagram showcasing the interaction > between the interfaces would be awesome to better understand the concept. > > I've updated the FLIP, and added the Outline Design section, to introduce how > Materialized Table interacts with the Workflow Scheduler in Full Refresh > mode via a timing diagram, it can help to understand this proposal design. > > > What about the RefreshHandler, I cannot find a definition of its > interface here. > Is it out of scope for this FLIP? > > There is some context that is not aligned here, RefreshHandler was proposed > in FLIP-435, you can get more detail from [1]. > > > If it is periodic, where is the period? > For the scheduleTime and format, why not simply pass an instance of > LocalDateTime or similar? The gateway should not have the responsibility to > parse the time. > > This might require a bit of context for clarity. In Full Refresh mode, the > Materialized Table requires the Scheduler to periodically trigger refresh > operations. This means that the Scheduler will periodically call the REST > API, passing parameters such as scheduleTime and scheduleTimeFormat. The > materialized table manager(to be introduced) relies on this information to > accurately calculate the correct time partitions. At the same time, we also > support manual refreshes of materialized tables, and in the future, we will > support manual cascading refreshes on a multi-table granularity. For cases > of manual cascading refresh, we will also register a one-time refresh > workflow with the Scheduler, which then triggers the execution via the REST > API call. However, during a manual refresh, users typically specify > partition information, and there's no need for the engine to deduce it, > thus scheduler time is not needed. > > Taking the above into account, there are two types of refresh workflows: > periodic workflows and one-time workflows. The engine requires different > information for each type of workflow. When designing the REST API, we aim > for this API to support both types of workflows simultaneously. Hence we > introduce the isPeriodic parameter for differentiation. Then the engine > will know what to do accordingly. > > The scheduleTime and scheduleTimeFormat are passed from Scheduler to the > Gateway via the REST API. Firstly, in the HTTP protocol, there is no type > equivalent to Java's LocalDateTime. Secondly, Schedulers can potentially be > written in different programming languages; for example, Airflow uses > Python to develop its workflows. Hence, it's obvious that we cannot limit > the Scheduler to the use of Java LocalDateTime type. Therefore, a String > type is the most suitable. Lastly, the purpose of the schedulerTime is to > determine the time partitioning details of the partitioned table. This > parsing responsibility falls upon the materialized table manager and not > the SqlGateway, which is solely responsible for passthrough parameters. > > You may refer to the Outline Design section of this FLIP, specifically the > Partitioned Table Full Refresh part in FLIP-435, to further comprehend the > overall design principles. > > > For the REST API: > wouldn't it be better (more REST) to move the `mt_identifier` to the URL? > E.g.: v3/materialized_tables/<mt_identifier>/refresh > > I think this is a good idea. I have another consideration though, does this > API support passing multiple materialized tables at the same time, if it > does, it will have to be put in the request body. I will discuss the design > of this API with ShengKai Fang offline, he is the owner of the Gateway > module. Anyway, your proposal is a good choice. > > > > From my current understanding, the workflow handle should not be bound > to the Dynamic Table. Therefore, if the workflow is modified, does it mean > that the scheduling information corresponding to the Dynamic Table will be > lost? > > You can see the FLIP Outline Design section to understand the overall > design further. The refresh handler is just a pointer that can locate the > workflow info in the scheduler, so scheduling info will be persistent to > the Scheduler, it will not lost. > > > Regarding the status information of the workflow, I am wondering if it > is necessary to provide an interface to display the backend scheduling > information? This would make it more convenient to view the execution > status of backend jobs. > > The RefreshHandler#asSummaryString will return the summary information of > the background refresh job, you can get it via DESC TABLE xxx. I think you > want to get detail information about background jobs, you should go to > Scheduler, it provides the most detailed information. Even if the interface > is provided, we don't get the complete information and how does this > interface show the information about the background? So I don't think it is > necessary. > > > [1] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-435%3A+Introduce+a+New+Materialized+Table+for+Simplifying+Data+Pipelines > > Best, > Ron > > > > Feng Jin <jinfeng1...@gmail.com> 于2024年4月25日周四 00:46写道: > > > Hi Ron > > > > Thank you for initiating this FLIP. > > > > My current questions are as follows: > > > > 1. From my current understanding, the workflow handle should not be bound > > to the Dynamic Table. Therefore, if the workflow is modified, does it mean > > that the scheduling information corresponding to the Dynamic Table will be > > lost? > > > > 2. Regarding the status information of the workflow, I am wondering if it > > is necessary to provide an interface to display the backend scheduling > > information? This would make it more convenient to view the execution > > status of backend jobs. > > > > > > Best, > > Feng > > > > > > On Wed, Apr 24, 2024 at 3:24 PM <lorenzo.affe...@ververica.com.invalid> > > wrote: > > > > > > Hello Ron Liu! Thank you for your FLIP! > > > > > > > > Here are my considerations: > > > > > > > > 1. > > > > About the Operations interfaces, how can they be empty? > > > > Should not they provide at least a `run` or `execute` method (similar to > > > > the command pattern)? > > > > In this way, their implementation can wrap all the implementations > > details > > > > of particular schedulers, and the scheduler can simply execute the > > command. > > > > In general, I think a simple sequence diagram showcasing the interaction > > > > between the interfaces would be awesome to better understand the > > > > concept. > > > > > > > > 2. > > > > What about the RefreshHandler, I cannot find a definition of its > > interface > > > > here. > > > > Is it out of scope for this FLIP? > > > > > > > > 3. > > > > For the SqlGatewayService arguments: > > > > > > > > boolean isPeriodic, > > > > @Nullable String scheduleTime, > > > > @Nullable String scheduleTimeFormat, > > > > > > > > If it is periodic, where is the period? > > > > For the scheduleTime and format, why not simply pass an instance of > > > > LocalDateTime or similar? The gateway should not have the responsibility > > to > > > > parse the time. > > > > > > > > 4. > > > > For the REST API: > > > > wouldn't it be better (more REST) to move the `mt_identifier` to the > > > > URL? > > > > E.g.: v3/materialized_tables/<mt_identifier>/refresh > > > > > > > > Thank you! > > > > On Apr 22, 2024 at 08:42 +0200, Ron Liu <ron9....@gmail.com>, wrote: > > > > > > Hi, Dev > > > > > > > > > > > > I would like to start a discussion about FLIP-448: Introduce > > > > > > Pluggable > > > > > > Workflow Scheduler Interface for Materialized Table. > > > > > > > > > > > > In FLIP-435[1], we proposed Materialized Table, which has two types > > > > > > of > > > > data > > > > > > refresh modes: Full Refresh & Continuous Refresh Mode. In Full > > > > > > Refresh > > > > > > mode, the Materialized Table relies on a workflow scheduler to > > > > > > perform > > > > > > periodic refresh operation to achieve the desired data freshness. > > > > > > > > > > > > There are numerous open-source workflow schedulers available, with > > > > popular > > > > > > ones including Airflow and DolphinScheduler. To enable Materialized > > Table > > > > > > to work with different workflow schedulers, we propose a pluggable > > > > workflow > > > > > > scheduler interface for Materialized Table in this FLIP. > > > > > > > > > > > > For more details, see FLIP-448 [2]. Looking forward to your > > > > > > feedback. > > > > > > > > > > > > [1] https://lists.apache.org/thread/c1gnn3bvbfs8v1trlf975t327s4rsffs > > > > > > [2] > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-448%3A+Introduce+Pluggable+Workflow+Scheduler+Interface+for+Materialized+Table > > > > > > > > > > > > Best, > > > > > > Ron > > > > > >