[
https://issues.apache.org/jira/browse/YUNIKORN-3119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mit Desai updated YUNIKORN-3119:
--------------------------------
Description:
h2. Summary
Add new observability metrics to track the number of applications and nodes
attempted during each scheduling cycle. This enhancement will improve debugging
capabilities for scheduling latency issues by providing visibility into
scheduling cycle efficiency and application processing patterns.
h2. Background
When debugging YuniKorn scheduling performance issues, it's important to
understand not just how long scheduling takes, but also how many applications
are being processed in each cycle and how many node evaluation did it take to
reach the conclusion. Currently, YuniKorn logs timing information but lacks
visibility into the number of applications and nodes attempted per scheduling
cycle, making it difficult to correlate scheduling latency with workload
characteristics.
h2. Proposed Solution
Add a new metric {{applicationsTried}} and {{nodesTried}} that tracks and
reports the number of applications and nodes attempted during each scheduling
cycle. This metric will be integrated into existing logging and monitoring
infrastructure.
h3. Key Features:
# {*}Applications Attempted Counter{*}: Track the number of applications
processed in each scheduling cycle
# {*}Integration with Existing Metrics{*}: Seamlessly integrate with current
timing and allocation metrics
# {*}Debugging Support{*}: Provide correlation data between application count
and scheduling latency
# {*}Minimal Performance Impact{*}: Lightweight counter that doesn't affect
scheduling performance
was:
h2. Summary
Add new observability metrics to track the number of applications and nodes
attempted during each scheduling cycle. This enhancement will improve debugging
capabilities for scheduling latency issues by providing visibility into
scheduling cycle efficiency and application processing patterns.
h2. Background
When debugging YuniKorn scheduling performance issues, it's important to
understand not just how long scheduling takes, but also how many applications
are being processed in each cycle and how many node evaluation did it take to
reach the conclusion. Currently, YuniKorn logs timing information but lacks
visibility into the number of applications and nodes attempted per scheduling
cycle, making it difficult to correlate scheduling latency with workload
characteristics.
h2. Proposed Solution
Add a new metric {{applicationsTried}} and {{nodesTried }}that tracks and
reports the number of applications and nodes attempted during each scheduling
cycle. This metric will be integrated into existing logging and monitoring
infrastructure.
h3. Key Features:
# {*}Applications Attempted Counter{*}: Track the number of applications
processed in each scheduling cycle
# {*}Integration with Existing Metrics{*}: Seamlessly integrate with current
timing and allocation metrics
# {*}Debugging Support{*}: Provide correlation data between application count
and scheduling latency
# {*}Minimal Performance Impact{*}: Lightweight counter that doesn't affect
scheduling performance
> Add Metrics for Monitoring Applications and Nodes Attempted in Each
> Scheduling Cycle
> ------------------------------------------------------------------------------------
>
> Key: YUNIKORN-3119
> URL: https://issues.apache.org/jira/browse/YUNIKORN-3119
> Project: Apache YuniKorn
> Issue Type: Improvement
> Components: core - scheduler
> Reporter: Mit Desai
> Assignee: Mit Desai
> Priority: Major
>
> h2. Summary
> Add new observability metrics to track the number of applications and nodes
> attempted during each scheduling cycle. This enhancement will improve
> debugging capabilities for scheduling latency issues by providing visibility
> into scheduling cycle efficiency and application processing patterns.
> h2. Background
> When debugging YuniKorn scheduling performance issues, it's important to
> understand not just how long scheduling takes, but also how many applications
> are being processed in each cycle and how many node evaluation did it take to
> reach the conclusion. Currently, YuniKorn logs timing information but lacks
> visibility into the number of applications and nodes attempted per scheduling
> cycle, making it difficult to correlate scheduling latency with workload
> characteristics.
> h2. Proposed Solution
> Add a new metric {{applicationsTried}} and {{nodesTried}} that tracks and
> reports the number of applications and nodes attempted during each scheduling
> cycle. This metric will be integrated into existing logging and monitoring
> infrastructure.
> h3. Key Features:
> # {*}Applications Attempted Counter{*}: Track the number of applications
> processed in each scheduling cycle
> # {*}Integration with Existing Metrics{*}: Seamlessly integrate with current
> timing and allocation metrics
> # {*}Debugging Support{*}: Provide correlation data between application
> count and scheduling latency
> # {*}Minimal Performance Impact{*}: Lightweight counter that doesn't affect
> scheduling performance
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]