alamb opened a new pull request, #16331: URL: https://github.com/apache/datafusion/pull/16331
Note: This PR contains an example and supporting code. It has no changes to the core. ## Which issue does this PR close? - Closes https://github.com/apache/datafusion/issues/12393 - Note this is new version version of https://github.com/apache/datafusion/pull/14286 ## Rationale for this change I have heard from multiple people multiple times over multiple years that the specifics of using multiple threadpools for separate CPU and IO work in DataFusion is confusing. They are not wrong, and it is a key detail for building low latency, high performance engines which process data directly from remote storage, which I think is a key capability for DataFusion My past attempts in https://github.com/apache/datafusion/pull/13424 and https://github.com/apache/datafusion/pull/14286 to make this example have been bogged down trying to get consensus on details of how to transfer results across streams, the wisdom of wrapping streams, and other details. Thankfully, thanks to @tustvold and @ion-elgreco there is now a much better solution in ObjectStore 0.12.1: https://github.com/apache/arrow-rs-object-store/pull/332 ## What changes are included in this PR? 1. `thread_pools.rs` example 2. Update documentation ## Are these changes tested? Yes the example is run as part of CI and there are tests ## Are there any user-facing changes? Not really -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org