Thanks Yuepeng for driving this proposal!

It's definitely a great addition for Adaptive Scheduler.
I see FLIP-487 has 2 docs, one is official wiki[1], one is google doc[2].
It's better to only use one doc to ensure the consistency, I prefer
to only use FLIP-487 official wiki due to you have the wiki permission.

> I'm also wondering whether it would make sense to focus on the REST
> endpoints in this FLIP and put the UI work in a separate FLIP. WDYT?
> Decreasing the scope would probably help handling the required changes

Thanks Matthias for the good suggestions, split it to 2 FLIPs makes sense
to me.
I have a minor question for this: could we discuss 2 FLIPs together?
I'm afraid the rest endpoints doesn't work as expected when we discuss
WebUI change in the future.

Also, I found the current FLIP wiki[1] and doc[2] have deleted all
information
related to WebUI change. If so, the FLIP title should be changed as well.
Currently, the title is "FLIP-487: Show history of rescales in Web UI for
AdaptiveScheduler"
but it seems FLIP-487 doesn't include any Web UI change.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-487%3A+Show+history+of+rescales+in+Web+UI+for+AdaptiveScheduler
[2]
https://docs.google.com/document/d/1WrLBkSkYe2tBQ3j66gKHFr2OB0d1HuHKDrRVr6B8nkM/edit?usp=sharing

Best,
Rui

On Wed, Dec 4, 2024 at 9:56 PM Yuepeng Pan <panyuep...@apache.org> wrote:

>
> Thanks Matthias  a lot  for the comments and guidance.
>
> >Can we reorganize the draft? Right now, we have some (for RescaleEvent,
> >Required/AcquiredParallelism) schema defined in the "Proposed Changes"
> section
> >and some other schema under "Public Interfaces". It would be nice to have
> this
> >more organized. Just as a suggestion: In the end the proposed changes
> should list
> >the different REST endpoints you want to introduce (including the
> corresponding
> >schemas for request and response).
>
> Thanks for the comments. I’ve made some structural adjustments and content
> modifications to help clarify certain issues. Since some questions are
> still under discussion, I plan to update the document once we reach a
> consensus.
>
> > I'm also wondering whether it would make sense to focus on the REST
> > endpoints in this FLIP and put the UI work in a separate FLIP. WDYT?
> Decreasing
> > the scope would probably help handling the required changes.
>
> Nice proposal ! That will help us better focus our attention and
> concentrate on the main target.
>
> > Have you considered adding the onChange event timestamp for a rescale
> event
> > as well? We introduced a separation of the job requirements change event
> > and the actual rescale execution in FLIP-461 [1]. It might be worth
> > documenting the time when a change was monitored for the first time that
> > triggered the rescale. WDYT?
>
> Thank you very much for your comments.As described in the document, the
> original design had many abstract concepts that were relatively simple. The
> information recorded for each rescale event mainly reflected the current
> state, rather than the entire historical state of the adaptive scheduler
> throughout the rescale process.
> IIUC, when we add the onChange event timestamp for rescale events, each
> rescale event will capture the state changes of the adaptive scheduler at
> various stages (these states would form a collection similar to the set of
> all possible states of the adaptive scheduler) along with the corresponding
> timestamps. These states would essentially form an ordered collection.
> Compared to the original design, this would undoubtedly make internal
> information more transparent, which I think is very meaningful.
> If I misunderstood, please correct me.
>
> >You're mentioning "comments" as a field of the RescaleEvent in your
> proposal.
> >What's the use-case here? Where are these comments from? (update) A brief
> talk
> >with Yuepeng on that topic revealed that the field is supposed to be used
> for
> >errors that occurred during the rescale operation. My take on that one:
> >- We might want to reconsider the field name in that case (maybe
> >errors_during_rescale?). "comments" seems to be quite generic.
> > - Additionally, shouldn't we make this a list of errors rather than a
> String field?
>
> Sorry for not fully explaining the purpose of this field earlier. The
> intention behind adding this field is to highlight special circumstances
> during a rescale, such as exceptions caused by insufficient resources or
> situations where a new request overrides the current one, forcing us to
> abandon the ongoing rescale operation and reallocate resources based on the
> new request. In such cases, it would be valuable to record what exactly
> happened internally. That said, I completely agree with the point that
> "comments" might seem overly broad.
> Maybe we need to address a few key questions here:
>
> 1. Is it necessary to retain a "comments"-like field to capture such
> information?
> 2. Should error-related information also be retained?
> 3. If retained, where should the "comments" (if applicable) and error
> information fields be located?
>     - Should they be tied to a specific adaptive-scheduler-state window?
>     - Or should they be directly included in the rescale event message?
> 4. Should these fields be implemented as lists?
>
> For the first and second items, I prefer to retain both types of
> information because they can provide transparent details to the user.
> For the third and fourth items, I prefer to associate these two fields as
> lists with specific adaptive-scheduler state windows. This approach could
> help organize the information better and link it to relevant events in the
> rescale process.
>
> I would greatly appreciate hearing more suggestions and ideas!
>
> > - How certain are we that we can associate errors to the actual rescale
> > operation and rather than the error being caused by something else?
>
> In my limited reading, it seems that only exceptions related to resource
> shortage during parallelism allocation can be classified into this type of
> exception. Other exceptions can be categorized outside this scope.
>
> > But there is no "attempt number" mentioned in the RescaleEvent schema.
>
> Please let me try to add to this question:
> - Each rescale requirement corresponds to a resourceRequirementsEpochID.
> - For the same rescale requirement (resourceRequirementsEpochID), multiple
> rescale attempts may occur to here. We treat each rescale under the current
> resource requirements as a rescale attempt. That is, for each rescale
> attempt on the same rescale requirement (resourceRequirementsEpochID), the
> rescale attempt id will increment by 1. When the
> resourceRequirementsEpochID is changed, the rescale attempt will be reset.
> - Based on these two points, whenever a rescale attempt happens, the
> rescaleID will increment globally.
>
> I hope this may clarify the logic and approach. Please let me know what’s
> your opinion.
>
> > Additionally, what is the ID based on? Do we start from 0 and just
> increment?
> > Or do we want to have a mechanism that ensures that the IDs are also
> > unique/monotonically increasing after JobManager failovers?
>
> To be honest, I expect initially it to increment from 0 every time a job
> is started since the rescale ID is not strongly tied to the state. If we
> can ensure that the ID remains unique and increase monotonically even after
> a JobManager failover, this would make the rescale ID more consistent,
> right? If so, that makes sense to me. Please let me know your thoughts.
>
> > For the parallelism schema: I might be misreading the draft here but
> you're proposing to use the subtask
> > name as the ID to refer to the JobVertex? That the name might become
> quite long. What about using the
> > JobVertexID here. That would be also more aligned to how the parallelism
> is represented by the
> > /jobs/<job-id>/resource-requirements endpoint. If we want to add the
> task name for readability purposes, we
> > can still add this one as a taskName field to the
> Required/AcquiredParallelism schema.
>
> Good proposal! I did consider using JobVertexID, but for ease of use, I
> ended up changing it to taskName. I'm wondering if we could use both
> JobVertexID and taskName. Could we provide a parameter to control the
> maximum length of taskName? In this way, we could ensure the uniqueness of
> JobVertexID while maintaining a certain level of readability for taskName.
> WDYTA?
>
> > Status field:
> > - What is the meaning of "TRYING"? I guess, we're more or less using the
> > AdaptiveScheduler states here, aren't we? Can't we align/stick to the
> > naming that's defined in the AdaptiveScheduler state?
>
> As mentioned above, this type of state in the design is used to describe
> the current status of a rescale event. It is not equivalent to the state of
> the adaptive scheduler; rather, it represents a window just after receiving
> an update request but before reaching the next defined state.
>
> > Can't we align/stick to the naming that's defined in the
> AdaptiveScheduler state?
>
> I previously conducted some research on the states of the
> AdaptiveScheduler [1]. Plase let me first try to define the necessary
> conditions for a Rescale Event:
>
> - The process starting from WaitingForResources / CreatingExecutionGraph
> and ending at Executing / FINISHED is considered one rescale.
> - If the process starts from WaitingForResources / CreatingExecutionGraph
> and ends at Executing, this can be considered a successful rescale.
> - If the process starts from WaitingForResources / CreatingExecutionGraph
> and transitions directly to FINISHED for other reasons, it can be
> considered a failed rescale.
> Based on these definitions, the adaptive-scheduler states that can be
> included in the rescale process seem to be Created, CreatingExecutionGraph,
> WaitingForResources, Executing, StopWithSavepoint, and Restarting.
>
> If everything is aligned and we treat the AdaptiveScheduler states as
> equivalent to the RescaleEvent states, some of the states might seem a bit
> odd. For example, after a successful rescale, the final state of the
> rescale event would be Executing(as same as adaptive scheduler). Would we
> consider this situation reasonable? If so, Aligning the states would make
> sense to me.
> Additionally, if the above definition holds, each rescale would need a
> "final state" or “current state” field to indicate its status.
>
> Maybe it’s important to define and reach consensus on the rescale event
> and its start and end boundaries.
> Please correct me if I’m wrong. Thank you very much!
>
>
> > Do we really need a new REST endpoint for the configuration? Can't we
> get
> > the provided information already from the existing configuration
> endpoint?
>
> Thank you for your comments. Sorry for not finding a suitable existing
> REST interface to reuse.
> But there’s a part of ‘/jobs/:jobid/config‘ that could be reused at the
> backend side.
> Another reason for introducing a new interface is to directly[2] retrieve
> the latest values from the Adaptive Scheduler, as we cannot guarantee that
> the configuration values used in the Adaptive Scheduler are always the ones
> specified in the configuration file. For example, some parameter values may
> be overwritten, which could change them. WDYTA?
>
> > For the summary endpoint: I see similarities to the checkpoint summary
> > here. Not sure whether you already considered that but would it make
> sense
> > to align the field names in some way to have a consistent look-and-feel?
> > I'm also wondering whether it makes sense to align the schema to have
> > something like the latest rescale, failed rescale, ...
>
> Thank you for the reminder. Yes, I have considered it. For example, I
> tried to align the request information in the REST interface with the
> checkpoint summary, as seen in this document[2]. However, I haven't
> strictly followed the same interface style and fields as the checkpoint
> REST interface, and instead used a simpler list (sorry, I should have
> shared these considerations in the draft earlier to give developers more
> context). If needed, I think it’s a great choice.
>
> By the way, perhaps we should also consider the archive feature for the
> rescale history, to support viewing this information on the history server.
>
> Thank you  very much.
>
>
> [1]
> https://drive.google.com/file/d/1FE17cAfbUXypYBp0soaVLu5oqn39BoKI/view?usp=sharing
> [2]
> https://docs.google.com/document/d/1WrLBkSkYe2tBQ3j66gKHFr2OB0d1HuHKDrRVr6B8nkM/edit?disco=AAABSsVvI6Q
>
>
>
>
> On 2024/12/02 08:59:16 Matthias Pohl wrote:
> > Hi Yuepeng,
> > thanks for the proposal. Having a way to see the history of rescales is a
> > nice feature, I guess. I went over the draft and have a few questions:
> >
> > Can we reorganize the draft? Right now, we have some (for RescaleEvent,
> > Required/AcquiredParallelism) schema defined in the "Proposed Changes"
> > section and some other schema under "Public Interfaces". It would be nice
> > to have this more organized.
> > Just as a suggestion: In the end the proposed changes should list the
> > different REST endpoints you want to introduce (including the
> corresponding
> > schemas for request and response).
> > ---
> > I'm also wondering whether it would make sense to focus on the REST
> > endpoints in this FLIP and put the UI work in a separate FLIP. WDYT?
> > Decreasing the scope would probably help handling the required changes.
> > ---
> > Have you considered adding the onChange event timestamp for a rescale
> event
> > as well? We introduced a separation of the job requirements change event
> > and the actual rescale execution in FLIP-461 [1]. It might be worth
> > documenting the time when a change was monitored for the first time that
> > triggered the rescale. WDYT?
> > ---
> > You're mentioning "comments" as a field of the RescaleEvent in your
> > proposal. What's the use-case here? Where are these comments from?
> >
> > (update)
> > A brief talk with Yuepeng on that topic revealed that the field is
> supposed
> > to be used for errors that occurred during the rescale operation. My take
> > on that one:
> > - We might want to reconsider the field name in that case (maybe
> > errors_during_rescale?). "comments" seems to be quite generic.
> > - Additionally, shouldn't we make this a list of errors rather than a
> > String field?
> > - How certain are we that we can associate errors to the actual rescale
> > operation and rather than the error being caused by something else?
> > ---
> > In the schema of the RescaleEvent you describe the three different
> > ID/numbers in the following way:
> >
> > > The ‘id’ is automatically incremental, The rescaleAttemptId is
> generated
> > > based on one specified resource-requirement and the attempt number is
> > > generated based on rescaleAttemptId.
> >
> >  But there is no "attempt number" mentioned in the RescaleEvent schema.
> > Additionally, what is the ID based on? Do we start from 0 and just
> > increment? Or do we want to have a mechanism that ensures that the IDs
> are
> > also unique/monotonically increasing after JobManager failovers?
> > ---
> > For the parallelism schema: I might be misreading the draft here but
> you're
> > proposing to use the subtask name as the ID to refer to the JobVertex?
> That
> > the name might become quite long. What about using the JobVertexID here.
> > That would be also more aligned to how the parallelism is represented by
> > the /jobs/<job-id>/resource-requirements endpoint. If we want to add the
> > task name for readability purposes, we can still add this one as a
> taskName
> > field to the Required/AcquiredParallelism schema.
> > ---
> > Status field:
> > - What is the meaning of "TRYING"? I guess, we're more or less using the
> > AdaptiveScheduler states here, aren't we? Can't we align/stick to the
> > naming that's defined in the AdaptiveScheduler state?
> > ---
> > Do we really need a new REST endpoint for the configuration? Can't we get
> > the provided information already from the existing configuration
> endpoint?
> > That said, I still find it useful to have a config tab in the UI at the
> end.
> > ---
> > For the summary endpoint: I see similarities to the checkpoint summary
> > here. Not sure whether you already considered that but would it make
> sense
> > to align the field names in some way to have a consistent look-and-feel?
> > I'm also wondering whether it makes sense to align the schema to have
> > something like latest rescale, failed rescale, ...
> >
> > Best,
> > Matthias
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-461%3A+Synchronize+rescaling+with+checkpoint+creation+to+minimize+reprocessing+for+the+AdaptiveScheduler
> >
> > On Mon, Nov 25, 2024 at 11:24 AM yuanfeng hu <yuanf...@apache.org>
> wrote:
> >
> > > +1, I think this feature is very useful for adaptive scheduler.
> > >
> > > Yuepeng Pan <panyuep...@apache.org> 于2024年11月22日周五 18:38写道:
> > >
> > > > Hi community,
> > > >
> > > >
> > > >
> > > >
> > > > Currently, the Adaptive Scheduler already supports the REST API
> > > >
> > > > to manually adjust[1] the parallelism of jobs, which enhances the
> > > >
> > > > functionality of the Adaptive Scheduler.
> > > >
> > > > However, Adaptive Scheduler doesn't support displaying or tracing the
> > > > rescale history yet[2].
> > > >
> > > > This makes it inconvenient for users/devs to quickly obtain some
> internal
> > > >
> > > > information about the rescale history of the Adaptive Scheduler.
> > > >
> > > > And showing the history of rescale events of AdaptiveScheduler in
> the web
> > > >
> > > > UI is very useful for users to make the next step for jobs.
> > > >
> > > >
> > > >
> > > >
> > > > Therefore, I created the FLIP-487[3] doc to support
> > > >
> > > > 'Show history of rescales in Web UI for AdaptiveScheduler'.
> > > >
> > > > Please refer to the google document[3] for more details
> > > >
> > > > about the proposed design and implementation.
> > > >
> > > >
> > > >
> > > >
> > > > Looking forward to any feedback and opinions on this proposal.
> > > >
> > > >
> > > >
> > > >
> > > > [1]
> > > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-291%3A+Externalized+Declarative+Resource+Management
> > > >
> > > > [2] https://issues.apache.org/jira/browse/FLINK-22258
> > > >
> > > > [3]
> > > >
> > >
> https://docs.google.com/document/d/1WrLBkSkYe2tBQ3j66gKHFr2OB0d1HuHKDrRVr6B8nkM/edit?tab=t.0
> > > >
> > > >
> > > >
> > > >
> > > > Thank you very much.
> > > >
> > > >
> > > >
> > > >
> > > > Best,
> > > >
> > > > Regards.
> > > >
> > > > Yuepeng Pan
> > >
> > >
> > >
> > > --
> > > Best,
> > > Yuanfeng
> > >
> >
>

Reply via email to