Re: [PR] feat: Early query cancellation based on per-segment sampling (druid)

via GitHub Fri, 01 May 2026 00:03:00 -0700


jtuglu1 commented on code in PR #19235:
URL: https://github.com/apache/druid/pull/19235#discussion_r3172405264



##########
processing/src/main/java/org/apache/druid/query/ChainedExecutionQueryRunner.java:
##########
@@ -153,8 +167,32 @@ public Iterable<T> call()
                     )
                 );
 
+            final int totalSegments = futures.size();
             ListenableFuture<List<Iterable<T>>> future = 
Futures.allAsList(futures);
             queryWatcher.registerQueryFuture(query, future);
+            
+            if (completedSegments != null && totalSegments >= samplingWindow 
&& context.hasTimeout()) {
+              for (ListenableFuture<?> f : futures) {
+                f.addListener(
+                    () -> {
+                      if (extrapolationCancelled.get()) {
+                        return;
+                      }
+                      int completed = completedSegments.get();
+                      if (completed >= samplingWindow) {
+                        long elapsedMs = 
TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - queryStartNanos);
+                        long extrapolatedMs = elapsedMs * totalSegments / 
completed;

Review Comment:
   > I think the formula does account for parallelism. The parallelism is 
already baked into elapsedMs / completed since elapsedMs is average wall-clock 
time per completion not avg per segment time.
   Example: 100 threads, 1000 segments, each segment taking 300ms:
   
   I disagree. I think this is an example of a "good" case. Look at something 
like samplingWindow=5, pool size of 100. You need to take parallelism into 
account (at least, your sample size should be ≥ parallelism).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] feat: Early query cancellation based on per-segment sampling (druid)

Reply via email to