Re: [PR] perf: Parallelize list_files_for_scan using tokio::task::JoinSet [datafusion]

via GitHub Wed, 28 Jan 2026 10:57:54 -0800


Tushar7012 commented on PR #20023:
URL: https://github.com/apache/datafusion/pull/20023#issuecomment-3813265502


   Great question! The difference is that `try_join_all` runs futures 
concurrently on the same task/thread (sharing borrowed references), while 
`JoinSet::spawn` creates separate tasks that can run in parallel across 
different threads in Tokio's thread pool.
   
   So yes, it is already parallelized - each `table_path` is processed in its 
own spawned task.
   
   I don't have an end-to-end benchmark yet, but the speedup would be most 
visible with multiple table paths and I/O latency (e.g., S3/GCS). Happy to add 
a benchmark if that would help with the review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] perf: Parallelize list_files_for_scan using tokio::task::JoinSet [datafusion]

Reply via email to