[ 
https://issues.apache.org/jira/browse/YUNIKORN-3120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YUNIKORN-3120:
-----------------------------------
    Target Version: 1.9.0  (was: 1.8.0)

> Enhance Scheduling Latency Metrics with Allocation State Labels
> ---------------------------------------------------------------
>
>                 Key: YUNIKORN-3120
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-3120
>             Project: Apache YuniKorn
>          Issue Type: Improvement
>          Components: core - scheduler
>            Reporter: Mit Desai
>            Assignee: Mit Desai
>            Priority: Major
>
> h3. Summary
> Enhance the existing scheduling latency metrics by adding state labels to 
> distinguish between scheduling cycles that result in successful pod 
> allocation versus cycles that don't find suitable allocations. This 
> improvement will significantly enhance debugging capabilities for scheduling 
> performance issues.
> h3. Background
> Currently, YuniKorn's {{yunikorn_scheduler_scheduling_latency_milliseconds}} 
> metric aggregates all scheduling cycles together, making it difficult to 
> distinguish between:
>  # {*}Allocation cycles{*}: Cycles where the scheduler successfully finds and 
> allocates resources for pending applications
>  # {*}Non-allocation cycles{*}: Cycles where the scheduler runs but cannot 
> find suitable allocations due to resource constraints, policy restrictions, 
> or other factors
> This lack of distinction makes it challenging to debug scheduling latency 
> issues, as operators cannot easily identify whether high latency is due to 
> complex allocation decisions or repeated failed allocation attempts.
> h3. Implementation Details
>  # {*}Metric Enhancement{*}: Add state label to existing histogram metric
>  # {*}Cycle Tracking{*}: Track allocation success/failure in scheduling loop
>  # {*}Threshold Logging{*}: Configurable threshold for detailed 
> non-allocation logging
>  # {*}Documentation{*}: Update monitoring guides and dashboard examples
> h3. Backward Compatibility
>  * Existing metric queries continue to work unchanged
>  * Additive enhancement that doesn't break existing monitoring setups
>  * Optional detailed logging that can be configured based on operational needs



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to