andygrove commented on code in PR #27: URL: https://github.com/apache/datafusion-ray/pull/27#discussion_r1798402790
########## docs/README.md: ########## @@ -260,13 +257,14 @@ child plans, building up a DAG of futures. ## Distributed Shuffle -The output of each query stage needs to be persisted somewhere so that the next query stage can read it. Currently, -RaySQL is just writing the output to disk in Arrow IPC format, and this means that RaySQL is not truly distributed -yet because it requires a shared file system. It would be better to use the Ray object store instead, as -proposed [here](https://github.com/datafusion-contrib/ray-sql/issues/22). +The output of each query stage needs to be persisted somewhere so that the next query stage can read it. +> Datafusion Ray was just writing the output to disk in Arrow IPC format, and this means that Datafusion Ray was not truly distributed because it requires a shared file system. Review Comment: I think that it would be better to explain the current architecture rather than talk about previous architectures here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
