beenhead opened a new pull request, #19547: URL: https://github.com/apache/druid/pull/19547
Have you ever stood up a streaming supervisor, only to later wish you could replay that same data as a batch (SQL-based) ingestion? Well, be prepared to have your experience taken to the next level! This PR adds a `Convert supervisor to SQL` flow to the web console query view. It complements the existing [Convert ingestion spec to SQL](https://github.com/apache/druid/pull/12919) tool: where that one migrates native batch and Hadoop specs, this one takes a streaming supervisor (Kafka/Kinesis) and generates an equivalent [Multi-Stage Query](https://druid.apache.org/docs/latest/multi-stage-query/) INSERT statement that reads from files instead of a stream — handy for backfills, reprocessing, and one-off migrations. ### Convert supervisor to SQL dialog A new Convert supervisor to SQL item is available from the ... menu in the query view. <img width="983" height="507" alt="image" src="https://github.com/user-attachments/assets/b9107858-82e2-4bdb-8368-cf21f8d9e75b" /> Clicking it opens a dialog that walks you through the conversion. <img width="295" alt="image" src="https://github.com/user-attachments/assets/df5c6ba5-a968-4830-90ef-cdf9e963e178" /> ### Pick your supervisor You can either select an existing supervisor (the dialog fetches the list from `/druid/indexer/v1/supervisor` and loads the spec on selection) or paste a supervisor JSON spec directly. Pasting is validated as you type, so malformed JSON surfaces an inline error rather than failing silently. ### Point it at your data Because a streaming supervisor has no batch input source, the dialog asks where the equivalent files live. It pre-populates the file location from the supervisor's `ioConfig.inputSource` when one is present, and you can pick the file type (JSON, CSV, Parquet, or ORC). The location scheme is used to build the right input source: s3://… → an s3 input source (with an objectGlob added automatically for directory locations) gs://… → a google input source http://… / https://… → an http input source anything else (including file://…) → a local input source ### Generate the SQL Clicking Generate SQL converts the spec and drops the resulting query into a new tab for you to review and edit before running. The conversion: - Builds a SELECT … FROM TABLE(EXTERN(…)) over the chosen files - Maps the supervisor's metricsSpec aggregators to their SQL equivalents (longSum → SUM, thetaSketch → APPROX_COUNT_DISTINCT_DS_THETA, HLLSketchBuild → APPROX_COUNT_DISTINCT_DS_HLL, and so on), adding a GROUP BY when rollup aggregations are present - Parses the timestamp via TIME_PARSE using the supervisor's timestampSpec - Emits PARTITIONED BY DAY and CLUSTERED BY the leading dimensions <img width="983" height="440" alt="image" src="https://github.com/user-attachments/assets/95c4ec74-523d-469f-aa36-1e483e268b8c" /> ### Tests Added `supervisor-conversion.spec.ts` covering the conversion helper: rollup vs. non-rollup queries, the full set of supported metric aggregations and their non-default arguments, dropping of unsupported metrics, input-source detection for each scheme, timestamp handling, partitioning/clustering, and the error paths. The existing `supervisor-to-sql-dialog.spec.tsx` covers basic rendering and the close action. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
