jtuglu1 commented on code in PR #19286:
URL: https://github.com/apache/druid/pull/19286#discussion_r3119847602


##########
indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/supervisor/autoscaler/CostBasedAutoScaler.java:
##########


Review Comment:
   Looking more closely at this code – why are we bounding the output results 
of this result by the min/max task count? Shouldn't we bounding the results by 
the theoretical maximum (e.g. the partition count)?
   
   Then, the calling code can validate whether min/max are breached (and 
emit/log a specific error indicating the scaler wanted to breach the min/max 
thresholds, alerting the operator)?
   
   My issue with this is that we are removing observability into the case where 
the min/max task count is indeed misconfigured relative to what the autoscaler 
thinks is the optimal task count. Would it not make more sense to compute the 
valid task counts relative to the stream? Then use the min/max config bounds to 
flag these cases **outside** of this method?
   
   IMO, this function should not care about the configured min/max task counts 
of the supervisor, and instead simply look at the stream data (lag, partitions, 
etc.) to derive the optimal task count. Whether that task count falls in the 
bounds of the min/max config can be enforced later, but we lose signal into 
what the original desired task count value was if we eagerly cap this value in 
`computeValidTaskCounts`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to