Hi, David. Thank you for your comments. And your suggestion makes sense.
What I originally intended here is something like: “Provide rescale history and setting information that can be used to analyze and optimize Adaptive Scheduler parameters.” Regarding the addition of examples, my original plan was to address this in the documentation ticket [1]. The documentation will include such scenarios. For example, if during one or multiple successful scale-up attempts a job still does not reach the maximum expected resources, users can look at the WaitingForResources duration from the rescale history. Based on this information, they may decide whether to increase the resource waiting time or related timeout parameters, and adjust the configuration accordingly. Hope it helps. [1] https://issues.apache.org/jira/browse/FLINK-38902 Best, Yuepeng Pan David Radley <[email protected]> 于2026年3月6日周五 19:41写道: > Hi Yuepeng, > I was looking the motivation section of the flip. It says: > > * Facilitate users to trace the history of rescale and make rescale > information more transparent > * Provide users with information on optimizing Adaptive Scheduler > parameters > > I wonder if the second point could read : > "Provide users with information that can be used to optimize Adaptive > Scheduler parameters." Or is it recommending optimizations? > > I would find it easier to read to have some motivating examples, of what > the history might show that could lead the user to decide to optimise. The > value of the Flip would then be more explicit. > > Kind regards, David. > > > > > From: Yuepeng Pan <[email protected]> > Date: Wednesday, 25 February 2026 at 02:58 > To: [email protected] <[email protected]> > Subject: [EXTERNAL] Re: [DISCUSS] FLIP-495: Support AdaptiveScheduler > record and query the rescale history > > Hi, community. > > FYI, > To ensure that the rescale history stored and recorded in FLIP-495 can be > accessed by external systems/users, we'd plan to release the FLIP-495 > functionality together with at least two sub-tasks[1][2] of FLIP-487[3]. > > These two sub-tasks will respectively support: > - retrieving all current rescale history records > - retrieving the detailed record of a specific rescale by its rescale UUID > > [1] https://issues.apache.org/jira/browse/FLINK-38894 > [2] https://issues.apache.org/jira/browse/FLINK-38895 > [3] https://issues.apache.org/jira/browse/FLINK-22258 > > Best, > Yuepeng Pan > > On 2025/09/18 04:03:22 Yuepeng Pan wrote: > > Hi, community. > > > > FYI: > > Since the design work of the query interface of rescale history was > separated into FLIP-487[1] during the discussion, we have therefore changed > the title of the FLIP to: > > > > FLIP-495: Support AdaptiveScheduler record and store the rescale history. > > > > [1] https://cwiki.apache.org/confluence/x/vZCMEw > > > > Best regards, > > Yuepeng Pan > > > > On 2025/08/19 09:13:22 Yuepeng Pan wrote: > > > Bumping this thread kindly. Thanks! > > > > > > Best, > > > Yuepeng Pan > > > > > > > > > > > > > > > At 2025-08-13 14:52:26, "Yuepeng Pan" <[email protected]> wrote: > > > > > > Hi, Matthias, > > > Thank you very much for your comments! > > > I have carefully read your reply and made some changes in the hope of > making improvements. > > > Please help take a look. > > > > > > For your comments: > > > > > > > 1. You mention a few options for when it comes to storing the data > which is > > > > good. The FLIP doesn't point out, though, what option you're going > to go > > > > for as part of this FLIP (as far as I can see). It would be good to > only > > > > outline the option to go for in the FLIP and list the other options > as > > > > rejected alternatives (with the pro's and con's). I think it make > sense to > > > > go for option 3 (i.e. following what's done for the > ExecutionGraphInfoStore > > > > for now). The other options can be considered as a follow-up. > > > > > > This is very meaningful. Based on this comment, I have kept option 3 > in its original place and moved the other candidate options to [1]. > > > > > > > 2. About the terminal states of a rescaling (i.e. IGNORED, FAILED, > > > > COMPLETED): Can we we clarify in the FLIP under what conditions the > > > > rescaling transitions into each of the three terminal states? > > > > > > Yes, this is a reasonable request for understanding and explaining the > logic of transitions to terminated states. > > > A new subsection [2] has been added to address this. > > > > > > > 3. The section "The information to record in a rescale event" could > be > > > > restructured in four sections (to remove redundancy): > > > > a) The IDs (Rescale > > > > ID, resourceRequirementsEpochID, > subRescaleIdOfResourceRequirementsEpochID): > > > > What about making these names easier to read: GlobalRescaleID, > RescaleUUID, > > > > RescaleAttemptId) > > > > b) Per-vertex data which includes: JobVertexID, JobVertexName, > > > > SlotSharingGroupId, the different parallelisms (pre-rescale, > sufficient, > > > > desired, post-rescale) > > > > c) The SlotSharingGroup information: SlotSharingGroupId, name, > > > > ResourceProfile > > > > d) Other information: Timestamps of state transitions, etc. as laid > out in > > > > the FLIP already > > > > > > That makes sense to me. Please check [3] for the latest updates in > this part. > > > > > > > 4. The FLIP doesn't explain how the data is passed through the > > > > AdaptiveScheduler states. We should be handling some kind of > > > > RescaleSnapshot that is passed through the different states and > updated and > > > > its final state is stored somewhere within AdaptiveScheduler in the > end, I > > > > guess. Can we clarify that in the FLIP? > > > > > > Indeed — this was missing in the original FLIP. To address this, I > have added [4], which focuses on describing how a Rescale is represented, > > > and how we can quickly pass and maintain the Rescale history. > > > > > > > 5. You mention the config parameters for the cache in the public > interface > > > > section. But there's no mentioning of any caching and how that is > used > > > > within the FLIP. > > > > > > Sorry for the rough description in the previous version. > > > Since this part belongs to the REST API acceleration mechanism for > rescaling, and Option 6 seems reasonable to me, > > > I plan to add it to FLIP-487 once the design of FLIP-495 has reached > consensus. > > > Of course, if needed, I'd be happy to clarify the usage and purpose of > this parameter in the current email thread. > > > > > > > 6. The REST endpoint is probably better suited in FLIP-487. FLIP-495 > should > > > > be about the actual implementation details and how the data is stored > > > > internally whereas FLIP-487 is about exposing the information to the > > > > outside through the REST API and the Flink UI. That would be a way to > > > > decrease the scope of FLIP-495. WDYT? > > > > > > That sounds nice to me. Therefore, I have moved all REST API–related > changes to FLIP-487. > > > BTW, to avoid repetitive changes in FLIP-487, I'll start organizing > FLIP-487 after FLIP-495 has been finalized. > > > > > > Looking forward to your next review! > > > > > > [1] > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=334760525#FLIP495:SupportAdaptiveSchedulerrecordandquerytherescalehistory-Aboutrescaleeventsstorage.1 > > > [2] > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=334760525#FLIP495:SupportAdaptiveSchedulerrecordandquerytherescalehistory-ThemainscenarioswhereRescalestatusswitchestoterminated > > > [3] > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=334760525#FLIP495:SupportAdaptiveSchedulerrecordandquerytherescalehistory-Theinformationtorecordinarescaleevent > > > [4] > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=334760525#FLIP495:SupportAdaptiveSchedulerrecordandquerytherescalehistory-InternalInterfaces > > > > > > > > > > > > > > > > > > Best regards, > > > Yuepeng Pan > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > At 2025-08-10 23:54:37, "Matthias Pohl" <[email protected]> wrote: > > > >Hi Yuepeng, > > > >thanks for reminding me of this FLIP. I went over it and have a few > items > > > >which we might need to address before we can actually finalize the > vote: > > > > > > > >1. You mention a few options for when it comes to storing the data > which is > > > >good. The FLIP doesn't point out, though, what option you're going to > go > > > >for as part of this FLIP (as far as I can see). It would be good to > only > > > >outline the option to go for in the FLIP and list the other options as > > > >rejected alternatives (with the pro's and con's). I think it make > sense to > > > >go for option 3 (i.e. following what's done for the > ExecutionGraphInfoStore > > > >for now). The other options can be considered as a follow-up. > > > >2. About the terminal states of a rescaling (i.e. IGNORED, FAILED, > > > >COMPLETED): Can we we clarify in the FLIP under what conditions the > > > >rescaling transitions into each of the three terminal states? > > > >3. The section "The information to record in a rescale event" could be > > > >restructured in four sections (to remove redundancy): > > > > a) The IDs (Rescale > > > >ID, resourceRequirementsEpochID, > subRescaleIdOfResourceRequirementsEpochID): > > > >What about making these names easier to read: GlobalRescaleID, > RescaleUUID, > > > >RescaleAttemptId) > > > > b) Per-vertex data which includes: JobVertexID, JobVertexName, > > > >SlotSharingGroupId, the different parallelisms (pre-rescale, > sufficient, > > > >desired, post-rescale) > > > > c) The SlotSharingGroup information: SlotSharingGroupId, name, > > > >ResourceProfile > > > > d) Other information: Timestamps of state transitions, etc. as laid > out in > > > >the FLIP already > > > >4. The FLIP doesn't explain how the data is passed through the > > > >AdaptiveScheduler states. We should be handling some kind of > > > >RescaleSnapshot that is passed through the different states and > updated and > > > >its final state is stored somewhere within AdaptiveScheduler in the > end, I > > > >guess. Can we clarify that in the FLIP? > > > >5. You mention the config parameters for the cache in the public > interface > > > >section. But there's no mentioning of any caching and how that is used > > > >within the FLIP. > > > >6. The REST endpoint is probably better suited in FLIP-487. FLIP-495 > should > > > >be about the actual implementation details and how the data is stored > > > >internally whereas FLIP-487 is about exposing the information to the > > > >outside through the REST API and the Flink UI. That would be a way to > > > >decrease the scope of FLIP-495. WDYT? > > > > > > > >Best, > > > >Matthias > > > > > > > > > > > >On Mon, Mar 24, 2025 at 11:37 AM Yuepeng Pan <[email protected]> > wrote: > > > > > > > >> Hi, Community, > > > >> > > > >> There haven’t been any further responses to this email over the > past few > > > >> days. > > > >> I'd like to initiate a vote on the current proposal[1] in the next > few > > > >> days. > > > >> Please rest assured that I’m proceeding cautiously and not rushing > the > > > >> process. > > > >> If there are any concerns about this FLIP-495[1], > > > >> I will gladly pause and make the adjustments. > > > >> > > > >> Best regards, > > > >> Yuepeng Pan > > > >> > > > >> [1] > > > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-495%3A+Support+AdaptiveScheduler+record+and+query+the+rescale+history > > > >> > > > >> > > > >> On 2024/12/17 15:18:45 Yuepeng Pan wrote: > > > >> > Hi community, > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > We discussed several aspects of FLIP-487[1] 'Show history of > rescales in > > > >> Web UI for AdaptiveScheduler' > > > >> > and received a lot of valuable feedback. Based on the suggestions > from > > > >> the email thread[2], > > > >> > we plan to split the original proposal for FLIP-487[1]. > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > The current email thread and the FLIP-495[3] wiki will be used to > > > >> discuss 'Support AdaptiveScheduler in recording and querying the > rescale > > > >> history', > > > >> > while FLIP-487[1] will primarily focus on displaying-related > design > > > >> content > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > Looking forward to any feedback and opinions on FLIP-495[3]. > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > [1] > > > >> > https://cwiki.apache.org/confluence/display/FLINK/%5BWIP%5D+FLIP-487%3A+Show+history+of+rescales+in+Web+UI+for+AdaptiveScheduler > > > >> > > > > >> > [2] > https://lists.apache.org/thread/f4md4btkf006mxcxf66bng1kfz0rsn8c > > > >> > > > > >> > [3] > > > >> > https://cwiki.apache.org/confluence/display/FLINK/%5BWIP%5D+FLIP-495%3A+Support+AdaptiveScheduler+record+and+query+the+rescale+history > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > Thank you very much. > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > Best, > > > >> > > > > >> > Regards. > > > >> > > > > >> > Yuepeng Pan > > > >> > > > > > > > Unless otherwise stated above: > > IBM United Kingdom Limited > Registered in England and Wales with number 741598 > Registered office: Building C, IBM Hursley Office, Hursley Park Road, > Winchester, Hampshire SO21 2JN >
