[
https://issues.apache.org/jira/browse/YUNIKORN-3119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mit Desai updated YUNIKORN-3119:
--------------------------------
Summary: Add Metrics for Monitoring Applications and Nodes Attempted in
Each Scheduling Cycle (was: Add Metrics for Monitoring Applications Attempted
in Each Scheduling Cycle)
> Add Metrics for Monitoring Applications and Nodes Attempted in Each
> Scheduling Cycle
> ------------------------------------------------------------------------------------
>
> Key: YUNIKORN-3119
> URL: https://issues.apache.org/jira/browse/YUNIKORN-3119
> Project: Apache YuniKorn
> Issue Type: Improvement
> Components: core - scheduler
> Reporter: Mit Desai
> Assignee: Mit Desai
> Priority: Major
>
> h2. Summary
> Add new observability metrics to track the number of applications attempted
> during each scheduling cycle. This enhancement will improve debugging
> capabilities for scheduling latency issues by providing visibility into
> scheduling cycle efficiency and application processing patterns.
> h2. Background
> When debugging YuniKorn scheduling performance issues, it's important to
> understand not just how long scheduling takes, but also how many applications
> are being processed in each cycle. Currently, YuniKorn logs timing
> information but lacks visibility into the number of applications attempted
> per scheduling cycle, making it difficult to correlate scheduling latency
> with workload characteristics.
> h2. Proposed Solution
> Add a new metric {{applicationsTried}} that tracks and reports the number of
> applications attempted during each scheduling cycle. This metric will be
> integrated into existing logging and monitoring infrastructure.
> h3. Key Features:
> # {*}Applications Attempted Counter{*}: Track the number of applications
> processed in each scheduling cycle
> # {*}Integration with Existing Metrics{*}: Seamlessly integrate with current
> timing and allocation metrics
> # {*}Debugging Support{*}: Provide correlation data between application
> count and scheduling latency
> # {*}Minimal Performance Impact{*}: Lightweight counter that doesn't affect
> scheduling performance
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]