Tushar7012 commented on PR #20023:
URL: https://github.com/apache/datafusion/pull/20023#issuecomment-3813164303
Great question! The key difference is **where** the parallelization happens:
**Before (with `try_join_all`):**
```rust
future::try_join_all(self.table_paths.iter().map(|table_path| {
pruned_partition_list(ctx, store.as_ref(), table_path, ...)
}))
This creates futures but they all share the same borrowed context
([ctx](vscode-file://vscode-app/c:/Users/td334/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html),
[store](vscode-file://vscode-app/c:/Users/td334/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html),
etc.). While
[try_join_all](vscode-file://vscode-app/c:/Users/td334/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html)
can poll futures concurrently, they run on the same task/thread because they
need the borrowed references. The concurrency is limited by cooperative
yielding within a single async task.
After (with
[JoinSet](vscode-file://vscode-app/c:/Users/td334/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html)):
join_set.spawn(async move {
let stream = pruned_partition_list(&config, &runtime_env,
store.as_ref(), ...)
.await?;
stream.try_collect::<Vec<_>>().await
});
Each
[table_path](vscode-file://vscode-app/c:/Users/td334/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html)
is processed in a separate spawned task that can run on different threads in
the Tokio runtime's thread pool. This is true parallelism vs just concurrency.
Regarding benchmarks: I don't have an end-to-end benchmark yet. The benefit
would be most visible when:
There are multiple
[table_paths](vscode-file://vscode-app/c:/Users/td334/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html)
to scan
Object store operations have I/O latency (e.g., S3, GCS)
Running on multi-core systems
Would a benchmark demonstrating the speedup be helpful for reviewing this
PR? I can add one if needed.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]