[ 
https://issues.apache.org/jira/browse/YUNIKORN-3119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated YUNIKORN-3119:
--------------------------------
    Summary: Add Metrics for Monitoring Applications and Nodes Attempted in 
Each Scheduling Cycle  (was: Add Metrics for Monitoring Applications Attempted 
in Each Scheduling Cycle)

> Add Metrics for Monitoring Applications and Nodes Attempted in Each 
> Scheduling Cycle
> ------------------------------------------------------------------------------------
>
>                 Key: YUNIKORN-3119
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-3119
>             Project: Apache YuniKorn
>          Issue Type: Improvement
>          Components: core - scheduler
>            Reporter: Mit Desai
>            Assignee: Mit Desai
>            Priority: Major
>
> h2. Summary
> Add new observability metrics to track the number of applications attempted 
> during each scheduling cycle. This enhancement will improve debugging 
> capabilities for scheduling latency issues by providing visibility into 
> scheduling cycle efficiency and application processing patterns.
> h2. Background
> When debugging YuniKorn scheduling performance issues, it's important to 
> understand not just how long scheduling takes, but also how many applications 
> are being processed in each cycle. Currently, YuniKorn logs timing 
> information but lacks visibility into the number of applications attempted 
> per scheduling cycle, making it difficult to correlate scheduling latency 
> with workload characteristics.
> h2. Proposed Solution
> Add a new metric {{applicationsTried}} that tracks and reports the number of 
> applications attempted during each scheduling cycle. This metric will be 
> integrated into existing logging and monitoring infrastructure.
> h3. Key Features:
>  # {*}Applications Attempted Counter{*}: Track the number of applications 
> processed in each scheduling cycle
>  # {*}Integration with Existing Metrics{*}: Seamlessly integrate with current 
> timing and allocation metrics
>  # {*}Debugging Support{*}: Provide correlation data between application 
> count and scheduling latency
>  # {*}Minimal Performance Impact{*}: Lightweight counter that doesn't affect 
> scheduling performance



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to