Re: [PR] Update Datafusion Ray architecture docs [datafusion-ray]

via GitHub Sun, 13 Oct 2024 08:20:08 -0700


andygrove commented on code in PR #27:
URL: https://github.com/apache/datafusion-ray/pull/27#discussion_r1798402790



##########
docs/README.md:
##########
@@ -260,13 +257,14 @@ child plans, building up a DAG of futures.
 
 ## Distributed Shuffle
 
-The output of each query stage needs to be persisted somewhere so that the 
next query stage can read it. Currently,
-RaySQL is just writing the output to disk in Arrow IPC format, and this means 
that RaySQL is not truly distributed
-yet because it requires a shared file system. It would be better to use the 
Ray object store instead, as
-proposed [here](https://github.com/datafusion-contrib/ray-sql/issues/22).
+The output of each query stage needs to be persisted somewhere so that the 
next query stage can read it.
+> Datafusion Ray was just writing the output to disk in Arrow IPC format, and 
this means that Datafusion Ray was not truly distributed because it requires a 
shared file system.

Review Comment:
   I think that it would be better to explain the current architecture rather 
than talk about previous architectures here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Update Datafusion Ray architecture docs [datafusion-ray]

Reply via email to