Re: [DISCUSS] FLIP-487: Show history of rescales in Web UI for AdaptiveScheduler

Yuepeng Pan Sat, 07 Dec 2024 07:08:40 -0800

Thanks Eric for the comments !

Sorry for some confusion in your reading. As you see, there are some things 
that are still in discussion, so I haven't updated some of the current 
documentation or wiki.



> How/Where would we store the history of these scaling events?

Please let me try to clarify the questions first in the email thread.

- For how: 
We plan to introduce a parameter 'web.adaptive-scheduler.rescale-history-max' 
to control the number of recent rescale events to be stored in memory, 
and when a job fails to submit or reaches a terminal state, 
to store the most recent events in an archive json to a location 
where the history server can retrieve them. 

- For where:
Tend to to put it in the AdaptiveScheduler.


And I'd happy to hear more ideas about its. 


Thank you very much for your comments, 
I'll add the two items after confirming how to split the FLIP.


Best,
Yuepeng.



On 2024/12/06 16:32:07 Eric Xiao wrote:
> Hi Yuepeng Pan,
> 
> This would be a very useful feature to have in OSS. One question I have
> that wasn't outlined in the FLIP or I may have missed it was -
> 
> How/Where would we store the history of these scaling events? Is there an
> existing pattern for storing such information in other parts of Flink that
> we will be following (i.e. exceptions, savepoint/checkpoint history). I can
> see a couple of options (in-memory, a caching service, DFS, etc.) for
> storing the history, each having their own set of tradeoffs.
> 
> Best,
> 
> Eric
> 
> On Fri, Dec 6, 2024 at 5:23 AM Yuepeng Pan <[email protected]> wrote:
> 
> > Thanks Rui for the comments!
> >
> >
> >
> > > I have a minor question for this: could we discuss 2 FLIPs together?
> > > I'm afraid the rest endpoints doesn't work as expected when we discuss
> > > WebUI change in the future.
> > > Clarification: Organize related designs in different FLIP wikis and
> > > discuss them in different discussions. Together just means that
> > > two FLIPs can be carried out at the same time.
> > > After the design and discussion of both FLIPs are completed,
> > > we could start voting on each one.
> >
> >
> > It makes sense to me as there are some impacts between the two FLIPs
> > and it could help avoid missing some details.
> >
> > Let’s assume we proceed in this way, What do you think about using
> > FLIP-487 and its corresponding email thread to focus on discussing
> > the UI design to reduce ambiguity? Meanwhile, we can create a new
> > FLIP and email to discuss the features of REST and AdaptiveScheduler,
> > for example, with a title like: "Support AdaptiveScheduler record and
> > query the rescale history,"
> > or another more appropriate title.
> >
> > BWT, whatever the final conclusion is, I will adjust the content on the
> > wiki
> > instead of google doc after a reasonable conclusion,
> > and may subsequent discussions and updates be aligned with the FLIP on the
> > wiki.
> >
> > Looking forward to more ideas about it.
> >
> > Best,
> > Yuepeng Pan
> >
> >
> > On 2024/12/06 03:44:24 Rui Fan wrote:
> > > Thanks Yuepeng for driving this proposal!
> > >
> > > It's definitely a great addition for Adaptive Scheduler.
> > > I see FLIP-487 has 2 docs, one is official wiki[1], one is google doc[2].
> > > It's better to only use one doc to ensure the consistency, I prefer
> > > to only use FLIP-487 official wiki due to you have the wiki permission.
> > >
> > > > I'm also wondering whether it would make sense to focus on the REST
> > > > endpoints in this FLIP and put the UI work in a separate FLIP. WDYT?
> > > > Decreasing the scope would probably help handling the required changes
> > >
> > > Thanks Matthias for the good suggestions, split it to 2 FLIPs makes sense
> > > to me.
> > > I have a minor question for this: could we discuss 2 FLIPs together?
> > > I'm afraid the rest endpoints doesn't work as expected when we discuss
> > > WebUI change in the future.
> > >
> > > Also, I found the current FLIP wiki[1] and doc[2] have deleted all
> > > information
> > > related to WebUI change. If so, the FLIP title should be changed as well.
> > > Currently, the title is "FLIP-487: Show history of rescales in Web UI for
> > > AdaptiveScheduler"
> > > but it seems FLIP-487 doesn't include any Web UI change.
> > >
> > > [1]
> > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-487%3A+Show+history+of+rescales+in+Web+UI+for+AdaptiveScheduler
> > > [2]
> > >
> > https://docs.google.com/document/d/1WrLBkSkYe2tBQ3j66gKHFr2OB0d1HuHKDrRVr6B8nkM/edit?usp=sharing
> > >
> > > Best,
> > > Rui
> > >
> > > On Wed, Dec 4, 2024 at 9:56 PM Yuepeng Pan <[email protected]>
> > wrote:
> > >
> > > >
> > > > Thanks Matthias  a lot  for the comments and guidance.
> > > >
> > > > >Can we reorganize the draft? Right now, we have some (for
> > RescaleEvent,
> > > > >Required/AcquiredParallelism) schema defined in the "Proposed Changes"
> > > > section
> > > > >and some other schema under "Public Interfaces". It would be nice to
> > have
> > > > this
> > > > >more organized. Just as a suggestion: In the end the proposed changes
> > > > should list
> > > > >the different REST endpoints you want to introduce (including the
> > > > corresponding
> > > > >schemas for request and response).
> > > >
> > > > Thanks for the comments. I’ve made some structural adjustments and
> > content
> > > > modifications to help clarify certain issues. Since some questions are
> > > > still under discussion, I plan to update the document once we reach a
> > > > consensus.
> > > >
> > > > > I'm also wondering whether it would make sense to focus on the REST
> > > > > endpoints in this FLIP and put the UI work in a separate FLIP. WDYT?
> > > > Decreasing
> > > > > the scope would probably help handling the required changes.
> > > >
> > > > Nice proposal ! That will help us better focus our attention and
> > > > concentrate on the main target.
> > > >
> > > > > Have you considered adding the onChange event timestamp for a rescale
> > > > event
> > > > > as well? We introduced a separation of the job requirements change
> > event
> > > > > and the actual rescale execution in FLIP-461 [1]. It might be worth
> > > > > documenting the time when a change was monitored for the first time
> > that
> > > > > triggered the rescale. WDYT?
> > > >
> > > > Thank you very much for your comments.As described in the document, the
> > > > original design had many abstract concepts that were relatively
> > simple. The
> > > > information recorded for each rescale event mainly reflected the
> > current
> > > > state, rather than the entire historical state of the adaptive
> > scheduler
> > > > throughout the rescale process.
> > > > IIUC, when we add the onChange event timestamp for rescale events, each
> > > > rescale event will capture the state changes of the adaptive scheduler
> > at
> > > > various stages (these states would form a collection similar to the
> > set of
> > > > all possible states of the adaptive scheduler) along with the
> > corresponding
> > > > timestamps. These states would essentially form an ordered collection.
> > > > Compared to the original design, this would undoubtedly make internal
> > > > information more transparent, which I think is very meaningful.
> > > > If I misunderstood, please correct me.
> > > >
> > > > >You're mentioning "comments" as a field of the RescaleEvent in your
> > > > proposal.
> > > > >What's the use-case here? Where are these comments from? (update) A
> > brief
> > > > talk
> > > > >with Yuepeng on that topic revealed that the field is supposed to be
> > used
> > > > for
> > > > >errors that occurred during the rescale operation. My take on that
> > one:
> > > > >- We might want to reconsider the field name in that case (maybe
> > > > >errors_during_rescale?). "comments" seems to be quite generic.
> > > > > - Additionally, shouldn't we make this a list of errors rather than a
> > > > String field?
> > > >
> > > > Sorry for not fully explaining the purpose of this field earlier. The
> > > > intention behind adding this field is to highlight special
> > circumstances
> > > > during a rescale, such as exceptions caused by insufficient resources
> > or
> > > > situations where a new request overrides the current one, forcing us to
> > > > abandon the ongoing rescale operation and reallocate resources based
> > on the
> > > > new request. In such cases, it would be valuable to record what exactly
> > > > happened internally. That said, I completely agree with the point that
> > > > "comments" might seem overly broad.
> > > > Maybe we need to address a few key questions here:
> > > >
> > > > 1. Is it necessary to retain a "comments"-like field to capture such
> > > > information?
> > > > 2. Should error-related information also be retained?
> > > > 3. If retained, where should the "comments" (if applicable) and error
> > > > information fields be located?
> > > >     - Should they be tied to a specific adaptive-scheduler-state
> > window?
> > > >     - Or should they be directly included in the rescale event message?
> > > > 4. Should these fields be implemented as lists?
> > > >
> > > > For the first and second items, I prefer to retain both types of
> > > > information because they can provide transparent details to the user.
> > > > For the third and fourth items, I prefer to associate these two fields
> > as
> > > > lists with specific adaptive-scheduler state windows. This approach
> > could
> > > > help organize the information better and link it to relevant events in
> > the
> > > > rescale process.
> > > >
> > > > I would greatly appreciate hearing more suggestions and ideas!
> > > >
> > > > > - How certain are we that we can associate errors to the actual
> > rescale
> > > > > operation and rather than the error being caused by something else?
> > > >
> > > > In my limited reading, it seems that only exceptions related to
> > resource
> > > > shortage during parallelism allocation can be classified into this
> > type of
> > > > exception. Other exceptions can be categorized outside this scope.
> > > >
> > > > > But there is no "attempt number" mentioned in the RescaleEvent
> > schema.
> > > >
> > > > Please let me try to add to this question:
> > > > - Each rescale requirement corresponds to a
> > resourceRequirementsEpochID.
> > > > - For the same rescale requirement (resourceRequirementsEpochID),
> > multiple
> > > > rescale attempts may occur to here. We treat each rescale under the
> > current
> > > > resource requirements as a rescale attempt. That is, for each rescale
> > > > attempt on the same rescale requirement (resourceRequirementsEpochID),
> > the
> > > > rescale attempt id will increment by 1. When the
> > > > resourceRequirementsEpochID is changed, the rescale attempt will be
> > reset.
> > > > - Based on these two points, whenever a rescale attempt happens, the
> > > > rescaleID will increment globally.
> > > >
> > > > I hope this may clarify the logic and approach. Please let me know
> > what’s
> > > > your opinion.
> > > >
> > > > > Additionally, what is the ID based on? Do we start from 0 and just
> > > > increment?
> > > > > Or do we want to have a mechanism that ensures that the IDs are also
> > > > > unique/monotonically increasing after JobManager failovers?
> > > >
> > > > To be honest, I expect initially it to increment from 0 every time a
> > job
> > > > is started since the rescale ID is not strongly tied to the state. If
> > we
> > > > can ensure that the ID remains unique and increase monotonically even
> > after
> > > > a JobManager failover, this would make the rescale ID more consistent,
> > > > right? If so, that makes sense to me. Please let me know your thoughts.
> > > >
> > > > > For the parallelism schema: I might be misreading the draft here but
> > > > you're proposing to use the subtask
> > > > > name as the ID to refer to the JobVertex? That the name might become
> > > > quite long. What about using the
> > > > > JobVertexID here. That would be also more aligned to how the
> > parallelism
> > > > is represented by the
> > > > > /jobs/<job-id>/resource-requirements endpoint. If we want to add the
> > > > task name for readability purposes, we
> > > > > can still add this one as a taskName field to the
> > > > Required/AcquiredParallelism schema.
> > > >
> > > > Good proposal! I did consider using JobVertexID, but for ease of use, I
> > > > ended up changing it to taskName. I'm wondering if we could use both
> > > > JobVertexID and taskName. Could we provide a parameter to control the
> > > > maximum length of taskName? In this way, we could ensure the
> > uniqueness of
> > > > JobVertexID while maintaining a certain level of readability for
> > taskName.
> > > > WDYTA?
> > > >
> > > > > Status field:
> > > > > - What is the meaning of "TRYING"? I guess, we're more or less using
> > the
> > > > > AdaptiveScheduler states here, aren't we? Can't we align/stick to the
> > > > > naming that's defined in the AdaptiveScheduler state?
> > > >
> > > > As mentioned above, this type of state in the design is used to
> > describe
> > > > the current status of a rescale event. It is not equivalent to the
> > state of
> > > > the adaptive scheduler; rather, it represents a window just after
> > receiving
> > > > an update request but before reaching the next defined state.
> > > >
> > > > > Can't we align/stick to the naming that's defined in the
> > > > AdaptiveScheduler state?
> > > >
> > > > I previously conducted some research on the states of the
> > > > AdaptiveScheduler [1]. Plase let me first try to define the necessary
> > > > conditions for a Rescale Event:
> > > >
> > > > - The process starting from WaitingForResources /
> > CreatingExecutionGraph
> > > > and ending at Executing / FINISHED is considered one rescale.
> > > > - If the process starts from WaitingForResources /
> > CreatingExecutionGraph
> > > > and ends at Executing, this can be considered a successful rescale.
> > > > - If the process starts from WaitingForResources /
> > CreatingExecutionGraph
> > > > and transitions directly to FINISHED for other reasons, it can be
> > > > considered a failed rescale.
> > > > Based on these definitions, the adaptive-scheduler states that can be
> > > > included in the rescale process seem to be Created,
> > CreatingExecutionGraph,
> > > > WaitingForResources, Executing, StopWithSavepoint, and Restarting.
> > > >
> > > > If everything is aligned and we treat the AdaptiveScheduler states as
> > > > equivalent to the RescaleEvent states, some of the states might seem a
> > bit
> > > > odd. For example, after a successful rescale, the final state of the
> > > > rescale event would be Executing(as same as adaptive scheduler). Would
> > we
> > > > consider this situation reasonable? If so, Aligning the states would
> > make
> > > > sense to me.
> > > > Additionally, if the above definition holds, each rescale would need a
> > > > "final state" or “current state” field to indicate its status.
> > > >
> > > > Maybe it’s important to define and reach consensus on the rescale event
> > > > and its start and end boundaries.
> > > > Please correct me if I’m wrong. Thank you very much!
> > > >
> > > >
> > > > > Do we really need a new REST endpoint for the configuration? Can't we
> > > > get
> > > > > the provided information already from the existing configuration
> > > > endpoint?
> > > >
> > > > Thank you for your comments. Sorry for not finding a suitable existing
> > > > REST interface to reuse.
> > > > But there’s a part of ‘/jobs/:jobid/config‘ that could be reused at the
> > > > backend side.
> > > > Another reason for introducing a new interface is to directly[2]
> > retrieve
> > > > the latest values from the Adaptive Scheduler, as we cannot guarantee
> > that
> > > > the configuration values used in the Adaptive Scheduler are always the
> > ones
> > > > specified in the configuration file. For example, some parameter
> > values may
> > > > be overwritten, which could change them. WDYTA?
> > > >
> > > > > For the summary endpoint: I see similarities to the checkpoint
> > summary
> > > > > here. Not sure whether you already considered that but would it make
> > > > sense
> > > > > to align the field names in some way to have a consistent
> > look-and-feel?
> > > > > I'm also wondering whether it makes sense to align the schema to have
> > > > > something like the latest rescale, failed rescale, ...
> > > >
> > > > Thank you for the reminder. Yes, I have considered it. For example, I
> > > > tried to align the request information in the REST interface with the
> > > > checkpoint summary, as seen in this document[2]. However, I haven't
> > > > strictly followed the same interface style and fields as the checkpoint
> > > > REST interface, and instead used a simpler list (sorry, I should have
> > > > shared these considerations in the draft earlier to give developers
> > more
> > > > context). If needed, I think it’s a great choice.
> > > >
> > > > By the way, perhaps we should also consider the archive feature for the
> > > > rescale history, to support viewing this information on the history
> > server.
> > > >
> > > > Thank you  very much.
> > > >
> > > >
> > > > [1]
> > > >
> > https://drive.google.com/file/d/1FE17cAfbUXypYBp0soaVLu5oqn39BoKI/view?usp=sharing
> > > > [2]
> > > >
> > https://docs.google.com/document/d/1WrLBkSkYe2tBQ3j66gKHFr2OB0d1HuHKDrRVr6B8nkM/edit?disco=AAABSsVvI6Q
> > > >
> > > >
> > > >
> > > >
> > > > On 2024/12/02 08:59:16 Matthias Pohl wrote:
> > > > > Hi Yuepeng,
> > > > > thanks for the proposal. Having a way to see the history of rescales
> > is a
> > > > > nice feature, I guess. I went over the draft and have a few
> > questions:
> > > > >
> > > > > Can we reorganize the draft? Right now, we have some (for
> > RescaleEvent,
> > > > > Required/AcquiredParallelism) schema defined in the "Proposed
> > Changes"
> > > > > section and some other schema under "Public Interfaces". It would be
> > nice
> > > > > to have this more organized.
> > > > > Just as a suggestion: In the end the proposed changes should list the
> > > > > different REST endpoints you want to introduce (including the
> > > > corresponding
> > > > > schemas for request and response).
> > > > > ---
> > > > > I'm also wondering whether it would make sense to focus on the REST
> > > > > endpoints in this FLIP and put the UI work in a separate FLIP. WDYT?
> > > > > Decreasing the scope would probably help handling the required
> > changes.
> > > > > ---
> > > > > Have you considered adding the onChange event timestamp for a rescale
> > > > event
> > > > > as well? We introduced a separation of the job requirements change
> > event
> > > > > and the actual rescale execution in FLIP-461 [1]. It might be worth
> > > > > documenting the time when a change was monitored for the first time
> > that
> > > > > triggered the rescale. WDYT?
> > > > > ---
> > > > > You're mentioning "comments" as a field of the RescaleEvent in your
> > > > > proposal. What's the use-case here? Where are these comments from?
> > > > >
> > > > > (update)
> > > > > A brief talk with Yuepeng on that topic revealed that the field is
> > > > supposed
> > > > > to be used for errors that occurred during the rescale operation. My
> > take
> > > > > on that one:
> > > > > - We might want to reconsider the field name in that case (maybe
> > > > > errors_during_rescale?). "comments" seems to be quite generic.
> > > > > - Additionally, shouldn't we make this a list of errors rather than a
> > > > > String field?
> > > > > - How certain are we that we can associate errors to the actual
> > rescale
> > > > > operation and rather than the error being caused by something else?
> > > > > ---
> > > > > In the schema of the RescaleEvent you describe the three different
> > > > > ID/numbers in the following way:
> > > > >
> > > > > > The ‘id’ is automatically incremental, The rescaleAttemptId is
> > > > generated
> > > > > > based on one specified resource-requirement and the attempt number
> > is
> > > > > > generated based on rescaleAttemptId.
> > > > >
> > > > >  But there is no "attempt number" mentioned in the RescaleEvent
> > schema.
> > > > > Additionally, what is the ID based on? Do we start from 0 and just
> > > > > increment? Or do we want to have a mechanism that ensures that the
> > IDs
> > > > are
> > > > > also unique/monotonically increasing after JobManager failovers?
> > > > > ---
> > > > > For the parallelism schema: I might be misreading the draft here but
> > > > you're
> > > > > proposing to use the subtask name as the ID to refer to the
> > JobVertex?
> > > > That
> > > > > the name might become quite long. What about using the JobVertexID
> > here.
> > > > > That would be also more aligned to how the parallelism is
> > represented by
> > > > > the /jobs/<job-id>/resource-requirements endpoint. If we want to add
> > the
> > > > > task name for readability purposes, we can still add this one as a
> > > > taskName
> > > > > field to the Required/AcquiredParallelism schema.
> > > > > ---
> > > > > Status field:
> > > > > - What is the meaning of "TRYING"? I guess, we're more or less using
> > the
> > > > > AdaptiveScheduler states here, aren't we? Can't we align/stick to the
> > > > > naming that's defined in the AdaptiveScheduler state?
> > > > > ---
> > > > > Do we really need a new REST endpoint for the configuration? Can't
> > we get
> > > > > the provided information already from the existing configuration
> > > > endpoint?
> > > > > That said, I still find it useful to have a config tab in the UI at
> > the
> > > > end.
> > > > > ---
> > > > > For the summary endpoint: I see similarities to the checkpoint
> > summary
> > > > > here. Not sure whether you already considered that but would it make
> > > > sense
> > > > > to align the field names in some way to have a consistent
> > look-and-feel?
> > > > > I'm also wondering whether it makes sense to align the schema to have
> > > > > something like latest rescale, failed rescale, ...
> > > > >
> > > > > Best,
> > > > > Matthias
> > > > >
> > > > > [1]
> > > > >
> > > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-461%3A+Synchronize+rescaling+with+checkpoint+creation+to+minimize+reprocessing+for+the+AdaptiveScheduler
> > > > >
> > > > > On Mon, Nov 25, 2024 at 11:24 AM yuanfeng hu <[email protected]>
> > > > wrote:
> > > > >
> > > > > > +1, I think this feature is very useful for adaptive scheduler.
> > > > > >
> > > > > > Yuepeng Pan <[email protected]> 于2024年11月22日周五 18:38写道：
> > > > > >
> > > > > > > Hi community,
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Currently, the Adaptive Scheduler already supports the REST API
> > > > > > >
> > > > > > > to manually adjust[1] the parallelism of jobs, which enhances the
> > > > > > >
> > > > > > > functionality of the Adaptive Scheduler.
> > > > > > >
> > > > > > > However, Adaptive Scheduler doesn't support displaying or
> > tracing the
> > > > > > > rescale history yet[2].
> > > > > > >
> > > > > > > This makes it inconvenient for users/devs to quickly obtain some
> > > > internal
> > > > > > >
> > > > > > > information about the rescale history of the Adaptive Scheduler.
> > > > > > >
> > > > > > > And showing the history of rescale events of AdaptiveScheduler in
> > > > the web
> > > > > > >
> > > > > > > UI is very useful for users to make the next step for jobs.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Therefore, I created the FLIP-487[3] doc to support
> > > > > > >
> > > > > > > 'Show history of rescales in Web UI for AdaptiveScheduler'.
> > > > > > >
> > > > > > > Please refer to the google document[3] for more details
> > > > > > >
> > > > > > > about the proposed design and implementation.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Looking forward to any feedback and opinions on this proposal.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > [1]
> > > > > > >
> > > > > >
> > > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-291%3A+Externalized+Declarative+Resource+Management
> > > > > > >
> > > > > > > [2] https://issues.apache.org/jira/browse/FLINK-22258
> > > > > > >
> > > > > > > [3]
> > > > > > >
> > > > > >
> > > >
> > https://docs.google.com/document/d/1WrLBkSkYe2tBQ3j66gKHFr2OB0d1HuHKDrRVr6B8nkM/edit?tab=t.0
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Thank you very much.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Best,
> > > > > > >
> > > > > > > Regards.
> > > > > > >
> > > > > > > Yuepeng Pan
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Best,
> > > > > > Yuanfeng
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] FLIP-487: Show history of rescales in Web UI for AdaptiveScheduler

Reply via email to