alamb opened a new pull request, #13690: URL: https://github.com/apache/datafusion/pull/13690
## Which issue does this PR close? - Closes https://github.com/apache/datafusion/issues/12393 ## Rationale for this change See Rationale on - https://github.com/apache/datafusion/issues/12393 - https://github.com/apache/datafusion/pull/13424 @tustvold has (legitimate) concerns with the approach take in https://github.com/apache/datafusion/pull/13424: https://github.com/apache/datafusion/pull/13424#issuecomment-2494641789 > At the risk of repeating myself from https://github.com/datafusion-contrib/datafusion-dft/pull/248#issuecomment-2489110287 I would strongly discourage overloading the ObjectStore trait as some sort of IO/CPU boundary. > > Not only is this not what the trait is designed for, but it is overly pessimistic. Tokio is designed handle some CPU bound work, e.g. interleaved CSV processing or similar, it just can't handle tasks stalling for seconds at a time. This PR explores what his suggestion on https://github.com/apache/datafusion/pull/13424#issuecomment-2495131293 would look like ## What changes are included in this PR? Attempts to to wrap all IO operations done by DataFusion explicitly in a `spawn_io` call 1. Permit registering a DedicatedExecutor with the RuntimeEnv 2. Adding a method for spawn_io on the RuntimeEnv so that all thread pool management is there 3. Adding a call to annotate high level calls in SessionContext with running on the relevant pool 4. Add an example of how they would be used I will annotate challenges inline ## Are these changes tested? N/A ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. --> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org