FrankChen021 commented on code in PR #19369:
URL: https://github.com/apache/druid/pull/19369#discussion_r3168199707
##########
indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/supervisor/autoscaler/CostBasedAutoScaler.java:
##########
@@ -166,61 +166,47 @@ public int computeTaskCountForRollover()
}
}
+ /**
+ * Returns the task count the scaler wants to reach — the "optimal" count
computed from current
+ * metrics, unclamped by the autoscaler's min/max bounds. The supervisor is
responsible for
+ * clamping the return value to {@code [taskCountMin, taskCountMax]} and for
deciding whether
+ * to actually scale.
+ * <p>
+ * Contract:
+ * <ul>
+ * <li>Returns {@code -1} only for pathological cases (metrics unavailable
/ optimal cannot be
+ * computed). This signals to the supervisor that there is no useful
hint for operators.</li>
+ * <li>Returns the current task count when scale-down is configured to
happen on rollover only
+ * and the optimal would otherwise be a scale-down — the scaler's
"preferred" count in that
+ * mode is to stay put.</li>
+ * <li>Otherwise returns the optimal task count unchanged.</li>
+ * </ul>
+ */
public int computeTaskCountForScaleAction()
{
lastKnownMetrics = collectMetrics();
final int optimalTaskCount = computeOptimalTaskCount(lastKnownMetrics);
- int currentTaskCount = supervisor.getIoConfig().getTaskCount();
-
- // Take the current task count but clamp it to the configured boundaries
if it is outside the boundaries.
- // There might be a configuration instance with a handwritten taskCount
that is outside the boundaries.
- final boolean isTaskCountOutOfBounds = currentTaskCount <
config.getTaskCountMin()
- || currentTaskCount >
config.getTaskCountMax();
- if (isTaskCountOutOfBounds) {
- currentTaskCount = Math.min(config.getTaskCountMax(),
Math.max(config.getTaskCountMin(), currentTaskCount));
+ if (optimalTaskCount <= 0) {
Review Comment:
[P2] Cost scaler can leave out-of-bound counts untouched
This now treats every non-positive optimal count as a pathological scaler
failure, but `computeOptimalTaskCount` still returns `-1` for the normal case
where the current count remains cheapest. Because the old out-of-bounds clamp
was removed from this method, a cost-based supervisor whose current `taskCount`
is outside `[taskCountMin, taskCountMax]` can return `-1` before the supervisor
gets any boundary target to clamp, leaving the configured min/max violated and
emitting a misleading failure metric. Restore an explicit out-of-bounds
boundary return before cost optimization, or separate the normal no-op signal
from invalid-metrics failures.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]