jtuglu1 commented on code in PR #19286: URL: https://github.com/apache/druid/pull/19286#discussion_r3119847602
########## indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/supervisor/autoscaler/CostBasedAutoScaler.java: ########## Review Comment: Looking more closely at this code – why are we bounding the output results of this result by the min/max task count? Shouldn't we bounding the results by the theoretical maximum (e.g. the partition count)? Then, the calling code can validate whether min/max are breached (and emit/log a specific error indicating the scaler wanted to breach the min/max thresholds, alerting the operator)? My issue with this is that we are removing observability into the case where the min/max task count is indeed misconfigured relative to what the autoscaler thinks is the optimal task count. Would it not make more sense to compute the valid task counts relative to the stream? Then use the min/max bounds to flag these cases? IMO, this function should not care about the configured min/max task counts of the supervisor, and instead simply look at the stream data (lag, partitions, etc.) to derive the optimal task count. Whether that task count falls in the bounds of the min/max config can be enforced later, but we lose signal into what the original desired task count value was if we cap this value in `computeValidTaskCounts`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
