robtandy commented on issue #46:
URL: https://github.com/apache/datafusion-ray/issues/46#issuecomment-2595948582

   I was able to refactor the code such that batches are exchanged in a 
streaming fashion through the ray object store.
   
   Its too slow though and struggles to scale when run on larger TPCH scale 
factors.  The heavy interaction between all stages and exchanging object refs 
pointing to batches is too much for Ray.  I would observe a typical TPCH query 
creating 10000+ ray tasks during its execution and the bottle neck appeared to 
the be the object store.  
   
   I'll refactor again trying to exchange data with Arrow Flight as you have 
suggested @andygrove 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to