gortiz commented on code in PR #13733:
URL: https://github.com/apache/pinot/pull/13733#discussion_r1752031714
##########
pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/QueryRunner.java:
##########
@@ -256,4 +262,66 @@ private Map<String, String>
consolidateMetadata(Map<String, String> customProper
public void cancel(long requestId) {
_opChainScheduler.cancel(requestId);
}
+
+ public StagePlan explainQuery(
+ WorkerMetadata workerMetadata, StagePlan stagePlan, Map<String, String>
requestMetadata) {
+
+ if (!workerMetadata.isLeafStageWorker()) {
+ LOGGER.debug("Explain query on intermediate stages is a NOOP");
+ return stagePlan;
+ }
+ long requestId =
Long.parseLong(requestMetadata.get(CommonConstants.Query.Request.MetadataKeys.REQUEST_ID));
+ long timeoutMs =
Long.parseLong(requestMetadata.get(CommonConstants.Broker.Request.QueryOptionKey.TIMEOUT_MS));
+ long deadlineMs = System.currentTimeMillis() + timeoutMs;
+
+ StageMetadata stageMetadata = stagePlan.getStageMetadata();
+ Map<String, String> opChainMetadata =
consolidateMetadata(stageMetadata.getCustomProperties(), requestMetadata);
+
+ if (PipelineBreakerExecutor.hasPipelineBreakers(stagePlan)) {
+ // TODO: Support pipeline breakers before merging this feature.
+ LOGGER.error("Pipeline breaker is not supported in explain query");
+ return stagePlan;
+ }
Review Comment:
The main problem is that in order to have the exact physical plan in this
case we would need to actually execute the pipeline breaker part. For example,
a query like:
```sql
select whatever from table1
where col1 in (select something from table2 where col2 = cte)
```
the actual physical plan will depend on the result of `(select something
from table2 where col2 = cte)`. Assuming that subquery is evaluated to [100,
200, 300], the query would be:
```sql
select whatever from table1
where col1 in (100, 200, 300)
```
Which could, for example, use a inverted index. I had the happy idea of
generating a random set of values on the `col1` type, but that could create
incorrect plans. For example imagine we randomly generate the set of values
`1,2`. The query we would be using to explain would be:
```sql
select whatever from table1
where col1 in (1, 2)
```
Now imagine that values of `col1` went from 100 to 200. That query would be
explained as `ALL_SEGMENTS_PRUNED_ON_SERVER` when the actual plan would be
probably different
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]