Re: [PR] Multi stage explain [pinot]

via GitHub Tue, 10 Sep 2024 06:56:17 -0700


gortiz commented on code in PR #13733:
URL: https://github.com/apache/pinot/pull/13733#discussion_r1752031714



##########
pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/QueryRunner.java:
##########
@@ -256,4 +262,66 @@ private Map<String, String> 
consolidateMetadata(Map<String, String> customProper
   public void cancel(long requestId) {
     _opChainScheduler.cancel(requestId);
   }
+
+  public StagePlan explainQuery(
+      WorkerMetadata workerMetadata, StagePlan stagePlan, Map<String, String> 
requestMetadata) {
+
+    if (!workerMetadata.isLeafStageWorker()) {
+      LOGGER.debug("Explain query on intermediate stages is a NOOP");
+      return stagePlan;
+    }
+    long requestId = 
Long.parseLong(requestMetadata.get(CommonConstants.Query.Request.MetadataKeys.REQUEST_ID));
+    long timeoutMs = 
Long.parseLong(requestMetadata.get(CommonConstants.Broker.Request.QueryOptionKey.TIMEOUT_MS));
+    long deadlineMs = System.currentTimeMillis() + timeoutMs;
+
+    StageMetadata stageMetadata = stagePlan.getStageMetadata();
+    Map<String, String> opChainMetadata = 
consolidateMetadata(stageMetadata.getCustomProperties(), requestMetadata);
+
+    if (PipelineBreakerExecutor.hasPipelineBreakers(stagePlan)) {
+      // TODO: Support pipeline breakers before merging this feature.
+      LOGGER.error("Pipeline breaker is not supported in explain query");
+      return stagePlan;
+    }

Review Comment:
   The main problem is that in order to have the exact physical plan in this 
case we would need to actually execute the pipeline breaker part. For example, 
a query like:
   
   ```sql
   select whatever from table1 
   where col1 in (select something from table2 where col2 = cte)
   ```
   
   the actual physical plan will depend on the result of `(select something 
from table2 where col2 = cte)`. Assuming that subquery is evaluated to [100, 
200, 300], the query would be:
   
   ```sql
   select whatever from table1 
   where col1 in (100, 200, 300)
   ```
   
   Which could, for example, use a inverted index. I had the happy idea of 
generating a random set of values on the `col1` type, but that could create 
incorrect plans. For example imagine we randomly generate the set of values 
`1,2`. The query we would be using to explain would be:
   
   ```sql
   select whatever from table1 
   where col1 in (1, 2)
   ```
   
   Now imagine that values of `col1` went from 100 to 200. That query would be 
explained as `ALL_SEGMENTS_PRUNED_ON_SERVER` when the actual plan would be 
probably different



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Multi stage explain [pinot]

Reply via email to