jtuglu1 commented on code in PR #19235:
URL: https://github.com/apache/druid/pull/19235#discussion_r3172405264
##########
processing/src/main/java/org/apache/druid/query/ChainedExecutionQueryRunner.java:
##########
@@ -153,8 +167,32 @@ public Iterable<T> call()
)
);
+ final int totalSegments = futures.size();
ListenableFuture<List<Iterable<T>>> future =
Futures.allAsList(futures);
queryWatcher.registerQueryFuture(query, future);
+
+ if (completedSegments != null && totalSegments >= samplingWindow
&& context.hasTimeout()) {
+ for (ListenableFuture<?> f : futures) {
+ f.addListener(
+ () -> {
+ if (extrapolationCancelled.get()) {
+ return;
+ }
+ int completed = completedSegments.get();
+ if (completed >= samplingWindow) {
+ long elapsedMs =
TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - queryStartNanos);
+ long extrapolatedMs = elapsedMs * totalSegments /
completed;
Review Comment:
> I think the formula does account for parallelism. The parallelism is
already baked into elapsedMs / completed since elapsedMs is average wall-clock
time per completion not avg per segment time.
Example: 100 threads, 1000 segments, each segment taking 300ms:
I disagree. I think this is an example of a "good" case. Look at something
like samplingWindow=5, pool size of 100. You need to take parallelism into
account (at least, your sample size should be ≥ parallelism).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]