u35253 commented on issue #26646: URL: https://github.com/apache/superset/issues/26646#issuecomment-3118487517
### Example use case I'll give an example that would be useful here, if multi-statement "virutal (custom sql) Datasets" were to be supported in Superset. - **Problem statement:** On my database provider, the way my Superset is set up: CTEs that query the exact same immediately upstream CTE will re-run the whole logic of their source CTE. It recomputes the same exact data each time after the first. Database caching exists only for the whole-query level if it was already run, but actually does not help for this. E.g., adding 2 CTEs that both process the "main" CTE (which, at times, can be very convenient), will then TRIPLE the runtime, whether it was a "first" or "cached" run overall. What was a 5 second query becomes 15 seconds; a 10 second query becomes 30 seconds; a 30 second query becomes 1.5 minutes; a 1 minute query becomes a 3 minute query. What was once "tolerable" either for development or for actual use becomes "less so". - **Proposed solution:** Allow multi-query virtual datasets. That way, the First Statement could run a CACHE TABLE statement one time in my database's dialect This writes the "main CTE" result to disk, making a disk cache. Then, when the next two CTEs consume from that object (in the Second Statement), they could read the tiny, pre-computed dataset. That would allow the "full query" to run about as long as it takes to run just the "main" CTE (e.g., no multiplier is applied to the runtime). **Commentary:** Overall, in this example, without multi-statement Dataset support, the current approach tends toward the direction of "make the whole query run fast no matter what, so that tripling it is not perceptible", among other possibilities. I accept that. But, there could be an opportunity for certain use cases where multi-statement datasets could be very, very convenient. This example does not get into the implementation needs in the Superset code, much less for the support of 50+ datasource types, or overall requirements. Commenting here so that this pattern can be considered when this Issue gets reviewed again. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
