Thanks Yuepeng for driving this proposal! It's definitely a great addition for Adaptive Scheduler. I see FLIP-487 has 2 docs, one is official wiki[1], one is google doc[2]. It's better to only use one doc to ensure the consistency, I prefer to only use FLIP-487 official wiki due to you have the wiki permission.
> I'm also wondering whether it would make sense to focus on the REST > endpoints in this FLIP and put the UI work in a separate FLIP. WDYT? > Decreasing the scope would probably help handling the required changes Thanks Matthias for the good suggestions, split it to 2 FLIPs makes sense to me. I have a minor question for this: could we discuss 2 FLIPs together? I'm afraid the rest endpoints doesn't work as expected when we discuss WebUI change in the future. Also, I found the current FLIP wiki[1] and doc[2] have deleted all information related to WebUI change. If so, the FLIP title should be changed as well. Currently, the title is "FLIP-487: Show history of rescales in Web UI for AdaptiveScheduler" but it seems FLIP-487 doesn't include any Web UI change. [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-487%3A+Show+history+of+rescales+in+Web+UI+for+AdaptiveScheduler [2] https://docs.google.com/document/d/1WrLBkSkYe2tBQ3j66gKHFr2OB0d1HuHKDrRVr6B8nkM/edit?usp=sharing Best, Rui On Wed, Dec 4, 2024 at 9:56 PM Yuepeng Pan <panyuep...@apache.org> wrote: > > Thanks Matthias a lot for the comments and guidance. > > >Can we reorganize the draft? Right now, we have some (for RescaleEvent, > >Required/AcquiredParallelism) schema defined in the "Proposed Changes" > section > >and some other schema under "Public Interfaces". It would be nice to have > this > >more organized. Just as a suggestion: In the end the proposed changes > should list > >the different REST endpoints you want to introduce (including the > corresponding > >schemas for request and response). > > Thanks for the comments. I’ve made some structural adjustments and content > modifications to help clarify certain issues. Since some questions are > still under discussion, I plan to update the document once we reach a > consensus. > > > I'm also wondering whether it would make sense to focus on the REST > > endpoints in this FLIP and put the UI work in a separate FLIP. WDYT? > Decreasing > > the scope would probably help handling the required changes. > > Nice proposal ! That will help us better focus our attention and > concentrate on the main target. > > > Have you considered adding the onChange event timestamp for a rescale > event > > as well? We introduced a separation of the job requirements change event > > and the actual rescale execution in FLIP-461 [1]. It might be worth > > documenting the time when a change was monitored for the first time that > > triggered the rescale. WDYT? > > Thank you very much for your comments.As described in the document, the > original design had many abstract concepts that were relatively simple. The > information recorded for each rescale event mainly reflected the current > state, rather than the entire historical state of the adaptive scheduler > throughout the rescale process. > IIUC, when we add the onChange event timestamp for rescale events, each > rescale event will capture the state changes of the adaptive scheduler at > various stages (these states would form a collection similar to the set of > all possible states of the adaptive scheduler) along with the corresponding > timestamps. These states would essentially form an ordered collection. > Compared to the original design, this would undoubtedly make internal > information more transparent, which I think is very meaningful. > If I misunderstood, please correct me. > > >You're mentioning "comments" as a field of the RescaleEvent in your > proposal. > >What's the use-case here? Where are these comments from? (update) A brief > talk > >with Yuepeng on that topic revealed that the field is supposed to be used > for > >errors that occurred during the rescale operation. My take on that one: > >- We might want to reconsider the field name in that case (maybe > >errors_during_rescale?). "comments" seems to be quite generic. > > - Additionally, shouldn't we make this a list of errors rather than a > String field? > > Sorry for not fully explaining the purpose of this field earlier. The > intention behind adding this field is to highlight special circumstances > during a rescale, such as exceptions caused by insufficient resources or > situations where a new request overrides the current one, forcing us to > abandon the ongoing rescale operation and reallocate resources based on the > new request. In such cases, it would be valuable to record what exactly > happened internally. That said, I completely agree with the point that > "comments" might seem overly broad. > Maybe we need to address a few key questions here: > > 1. Is it necessary to retain a "comments"-like field to capture such > information? > 2. Should error-related information also be retained? > 3. If retained, where should the "comments" (if applicable) and error > information fields be located? > - Should they be tied to a specific adaptive-scheduler-state window? > - Or should they be directly included in the rescale event message? > 4. Should these fields be implemented as lists? > > For the first and second items, I prefer to retain both types of > information because they can provide transparent details to the user. > For the third and fourth items, I prefer to associate these two fields as > lists with specific adaptive-scheduler state windows. This approach could > help organize the information better and link it to relevant events in the > rescale process. > > I would greatly appreciate hearing more suggestions and ideas! > > > - How certain are we that we can associate errors to the actual rescale > > operation and rather than the error being caused by something else? > > In my limited reading, it seems that only exceptions related to resource > shortage during parallelism allocation can be classified into this type of > exception. Other exceptions can be categorized outside this scope. > > > But there is no "attempt number" mentioned in the RescaleEvent schema. > > Please let me try to add to this question: > - Each rescale requirement corresponds to a resourceRequirementsEpochID. > - For the same rescale requirement (resourceRequirementsEpochID), multiple > rescale attempts may occur to here. We treat each rescale under the current > resource requirements as a rescale attempt. That is, for each rescale > attempt on the same rescale requirement (resourceRequirementsEpochID), the > rescale attempt id will increment by 1. When the > resourceRequirementsEpochID is changed, the rescale attempt will be reset. > - Based on these two points, whenever a rescale attempt happens, the > rescaleID will increment globally. > > I hope this may clarify the logic and approach. Please let me know what’s > your opinion. > > > Additionally, what is the ID based on? Do we start from 0 and just > increment? > > Or do we want to have a mechanism that ensures that the IDs are also > > unique/monotonically increasing after JobManager failovers? > > To be honest, I expect initially it to increment from 0 every time a job > is started since the rescale ID is not strongly tied to the state. If we > can ensure that the ID remains unique and increase monotonically even after > a JobManager failover, this would make the rescale ID more consistent, > right? If so, that makes sense to me. Please let me know your thoughts. > > > For the parallelism schema: I might be misreading the draft here but > you're proposing to use the subtask > > name as the ID to refer to the JobVertex? That the name might become > quite long. What about using the > > JobVertexID here. That would be also more aligned to how the parallelism > is represented by the > > /jobs/<job-id>/resource-requirements endpoint. If we want to add the > task name for readability purposes, we > > can still add this one as a taskName field to the > Required/AcquiredParallelism schema. > > Good proposal! I did consider using JobVertexID, but for ease of use, I > ended up changing it to taskName. I'm wondering if we could use both > JobVertexID and taskName. Could we provide a parameter to control the > maximum length of taskName? In this way, we could ensure the uniqueness of > JobVertexID while maintaining a certain level of readability for taskName. > WDYTA? > > > Status field: > > - What is the meaning of "TRYING"? I guess, we're more or less using the > > AdaptiveScheduler states here, aren't we? Can't we align/stick to the > > naming that's defined in the AdaptiveScheduler state? > > As mentioned above, this type of state in the design is used to describe > the current status of a rescale event. It is not equivalent to the state of > the adaptive scheduler; rather, it represents a window just after receiving > an update request but before reaching the next defined state. > > > Can't we align/stick to the naming that's defined in the > AdaptiveScheduler state? > > I previously conducted some research on the states of the > AdaptiveScheduler [1]. Plase let me first try to define the necessary > conditions for a Rescale Event: > > - The process starting from WaitingForResources / CreatingExecutionGraph > and ending at Executing / FINISHED is considered one rescale. > - If the process starts from WaitingForResources / CreatingExecutionGraph > and ends at Executing, this can be considered a successful rescale. > - If the process starts from WaitingForResources / CreatingExecutionGraph > and transitions directly to FINISHED for other reasons, it can be > considered a failed rescale. > Based on these definitions, the adaptive-scheduler states that can be > included in the rescale process seem to be Created, CreatingExecutionGraph, > WaitingForResources, Executing, StopWithSavepoint, and Restarting. > > If everything is aligned and we treat the AdaptiveScheduler states as > equivalent to the RescaleEvent states, some of the states might seem a bit > odd. For example, after a successful rescale, the final state of the > rescale event would be Executing(as same as adaptive scheduler). Would we > consider this situation reasonable? If so, Aligning the states would make > sense to me. > Additionally, if the above definition holds, each rescale would need a > "final state" or “current state” field to indicate its status. > > Maybe it’s important to define and reach consensus on the rescale event > and its start and end boundaries. > Please correct me if I’m wrong. Thank you very much! > > > > Do we really need a new REST endpoint for the configuration? Can't we > get > > the provided information already from the existing configuration > endpoint? > > Thank you for your comments. Sorry for not finding a suitable existing > REST interface to reuse. > But there’s a part of ‘/jobs/:jobid/config‘ that could be reused at the > backend side. > Another reason for introducing a new interface is to directly[2] retrieve > the latest values from the Adaptive Scheduler, as we cannot guarantee that > the configuration values used in the Adaptive Scheduler are always the ones > specified in the configuration file. For example, some parameter values may > be overwritten, which could change them. WDYTA? > > > For the summary endpoint: I see similarities to the checkpoint summary > > here. Not sure whether you already considered that but would it make > sense > > to align the field names in some way to have a consistent look-and-feel? > > I'm also wondering whether it makes sense to align the schema to have > > something like the latest rescale, failed rescale, ... > > Thank you for the reminder. Yes, I have considered it. For example, I > tried to align the request information in the REST interface with the > checkpoint summary, as seen in this document[2]. However, I haven't > strictly followed the same interface style and fields as the checkpoint > REST interface, and instead used a simpler list (sorry, I should have > shared these considerations in the draft earlier to give developers more > context). If needed, I think it’s a great choice. > > By the way, perhaps we should also consider the archive feature for the > rescale history, to support viewing this information on the history server. > > Thank you very much. > > > [1] > https://drive.google.com/file/d/1FE17cAfbUXypYBp0soaVLu5oqn39BoKI/view?usp=sharing > [2] > https://docs.google.com/document/d/1WrLBkSkYe2tBQ3j66gKHFr2OB0d1HuHKDrRVr6B8nkM/edit?disco=AAABSsVvI6Q > > > > > On 2024/12/02 08:59:16 Matthias Pohl wrote: > > Hi Yuepeng, > > thanks for the proposal. Having a way to see the history of rescales is a > > nice feature, I guess. I went over the draft and have a few questions: > > > > Can we reorganize the draft? Right now, we have some (for RescaleEvent, > > Required/AcquiredParallelism) schema defined in the "Proposed Changes" > > section and some other schema under "Public Interfaces". It would be nice > > to have this more organized. > > Just as a suggestion: In the end the proposed changes should list the > > different REST endpoints you want to introduce (including the > corresponding > > schemas for request and response). > > --- > > I'm also wondering whether it would make sense to focus on the REST > > endpoints in this FLIP and put the UI work in a separate FLIP. WDYT? > > Decreasing the scope would probably help handling the required changes. > > --- > > Have you considered adding the onChange event timestamp for a rescale > event > > as well? We introduced a separation of the job requirements change event > > and the actual rescale execution in FLIP-461 [1]. It might be worth > > documenting the time when a change was monitored for the first time that > > triggered the rescale. WDYT? > > --- > > You're mentioning "comments" as a field of the RescaleEvent in your > > proposal. What's the use-case here? Where are these comments from? > > > > (update) > > A brief talk with Yuepeng on that topic revealed that the field is > supposed > > to be used for errors that occurred during the rescale operation. My take > > on that one: > > - We might want to reconsider the field name in that case (maybe > > errors_during_rescale?). "comments" seems to be quite generic. > > - Additionally, shouldn't we make this a list of errors rather than a > > String field? > > - How certain are we that we can associate errors to the actual rescale > > operation and rather than the error being caused by something else? > > --- > > In the schema of the RescaleEvent you describe the three different > > ID/numbers in the following way: > > > > > The ‘id’ is automatically incremental, The rescaleAttemptId is > generated > > > based on one specified resource-requirement and the attempt number is > > > generated based on rescaleAttemptId. > > > > But there is no "attempt number" mentioned in the RescaleEvent schema. > > Additionally, what is the ID based on? Do we start from 0 and just > > increment? Or do we want to have a mechanism that ensures that the IDs > are > > also unique/monotonically increasing after JobManager failovers? > > --- > > For the parallelism schema: I might be misreading the draft here but > you're > > proposing to use the subtask name as the ID to refer to the JobVertex? > That > > the name might become quite long. What about using the JobVertexID here. > > That would be also more aligned to how the parallelism is represented by > > the /jobs/<job-id>/resource-requirements endpoint. If we want to add the > > task name for readability purposes, we can still add this one as a > taskName > > field to the Required/AcquiredParallelism schema. > > --- > > Status field: > > - What is the meaning of "TRYING"? I guess, we're more or less using the > > AdaptiveScheduler states here, aren't we? Can't we align/stick to the > > naming that's defined in the AdaptiveScheduler state? > > --- > > Do we really need a new REST endpoint for the configuration? Can't we get > > the provided information already from the existing configuration > endpoint? > > That said, I still find it useful to have a config tab in the UI at the > end. > > --- > > For the summary endpoint: I see similarities to the checkpoint summary > > here. Not sure whether you already considered that but would it make > sense > > to align the field names in some way to have a consistent look-and-feel? > > I'm also wondering whether it makes sense to align the schema to have > > something like latest rescale, failed rescale, ... > > > > Best, > > Matthias > > > > [1] > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-461%3A+Synchronize+rescaling+with+checkpoint+creation+to+minimize+reprocessing+for+the+AdaptiveScheduler > > > > On Mon, Nov 25, 2024 at 11:24 AM yuanfeng hu <yuanf...@apache.org> > wrote: > > > > > +1, I think this feature is very useful for adaptive scheduler. > > > > > > Yuepeng Pan <panyuep...@apache.org> 于2024年11月22日周五 18:38写道: > > > > > > > Hi community, > > > > > > > > > > > > > > > > > > > > Currently, the Adaptive Scheduler already supports the REST API > > > > > > > > to manually adjust[1] the parallelism of jobs, which enhances the > > > > > > > > functionality of the Adaptive Scheduler. > > > > > > > > However, Adaptive Scheduler doesn't support displaying or tracing the > > > > rescale history yet[2]. > > > > > > > > This makes it inconvenient for users/devs to quickly obtain some > internal > > > > > > > > information about the rescale history of the Adaptive Scheduler. > > > > > > > > And showing the history of rescale events of AdaptiveScheduler in > the web > > > > > > > > UI is very useful for users to make the next step for jobs. > > > > > > > > > > > > > > > > > > > > Therefore, I created the FLIP-487[3] doc to support > > > > > > > > 'Show history of rescales in Web UI for AdaptiveScheduler'. > > > > > > > > Please refer to the google document[3] for more details > > > > > > > > about the proposed design and implementation. > > > > > > > > > > > > > > > > > > > > Looking forward to any feedback and opinions on this proposal. > > > > > > > > > > > > > > > > > > > > [1] > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-291%3A+Externalized+Declarative+Resource+Management > > > > > > > > [2] https://issues.apache.org/jira/browse/FLINK-22258 > > > > > > > > [3] > > > > > > > > https://docs.google.com/document/d/1WrLBkSkYe2tBQ3j66gKHFr2OB0d1HuHKDrRVr6B8nkM/edit?tab=t.0 > > > > > > > > > > > > > > > > > > > > Thank you very much. > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > Regards. > > > > > > > > Yuepeng Pan > > > > > > > > > > > > -- > > > Best, > > > Yuanfeng > > > > > >