Tushar7012 commented on PR #20023:
URL: https://github.com/apache/datafusion/pull/20023#issuecomment-3816000733

   Hey @2010YOUY01 ,
   
   Thank you for the feedback and for sharing the guide. I apologize if my 
previous responses felt disconnected; I’ve spent the last few hours doing a 
deep dive into the implementation and the performance trade-offs to ensure I 
fully understand the impact.
   
   I have updated the PR with a few key refinements:
   
   Parallelized IO: Switched the listing logic to use tokio::task::JoinSet. 
This allows us to process multiple table paths concurrently, which is critical 
for large datasets distributed across many prefixes.
   Performance Verification: I’ve added a 
   benchmark_parallel_listing
    test directly in 
   table.rs
   . On my local machine, I verified that for 10 paths with a 100ms simulated 
network latency, the execution time dropped from 1000ms (sequential) to ~102ms 
(parallel).
   WASM Compatibility: I kept the try_join_all fallback specifically for WASM 
targets since JoinSet isn't supported there, ensuring the build remains stable 
across all platforms.
   I’ve also cleaned up the imports and resolved the linting issues. I’m 
genuinely interested in improving DataFusion's performance here and would 
appreciate a fresh review of these technical changes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to