robtandy commented on issue #46: URL: https://github.com/apache/datafusion-ray/issues/46#issuecomment-2595948582
I was able to refactor the code such that batches are exchanged in a streaming fashion through the ray object store. Its too slow though and struggles to scale when run on larger TPCH scale factors. The heavy interaction between all stages and exchanging object refs pointing to batches is too much for Ray. I would observe a typical TPCH query creating 10000+ ray tasks during its execution and the bottle neck appeared to the be the object store. I'll refactor again trying to exchange data with Arrow Flight as you have suggested @andygrove -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org