Re: [I] Add Spark-like scheduling mode with shuffle files between stages [datafusion-ray]

via GitHub Sat, 01 Mar 2025 12:11:03 -0800


robtandy commented on issue #69:
URL: https://github.com/apache/datafusion-ray/issues/69#issuecomment-2692393525


   This makes sense to me.   DFRay as written tries to lean into low latency 
execution at all costs, though it does look like it will provide a lot of 
utility for longer running distributed execution as well. 
   
   One of the things DFRay is trying to do is not rewrite or replace any of the 
`ExecutionPlans` in DataFusion, including `RepartitionExec`s.   I am not sure 
if this is the same approach that Ballista uses.  
   
   I think, ideally, this new mode of execution diverges as little as possible 
from the existing mode of execution and we can use the exact same physical 
plans in both cases.
   
   For this to be true, once all stages are created by the `DFRayDataFrame`, we 
should arrange them into a DAG and execute them leaves up.  
   
   Assumptions, to make sure I understand what you're thinking here:
   - we would execute each stage to completion before moving up the DAG
   - we would change (new mode) the `ProcessorService` to, instead of making 
the stage partitions available for streaming via ArrowFlight, it would be asked 
to execute and materialize the results fully.   
   - these results would be in Arrow IPC format
   - The `DFRayStageReaderExec` would be changed (new mode) to, instead of 
streaming input from the remote stage via Arrow Flight, stream input from the 
previous materialized values
   
   Being that this is on Ray we have some choices:
   - Use the Ray object store for intermediate results:
     - Pros:
       - results are in memory only
       - Easy to do from within the python code `DFRayProcessor`
       - Confirms to Ray Cluster resource allocations and likely won't surprise 
cluster operators
     - Cons:
       - finite size compared to object storage
     - Notable
       - We would have `DFRayStageReaderExec` call back into python to get the 
results from Ray Object store
   - Use object store interface only
     - Pros:
       -  larger available size
       -  consumes less cluster resources
     - Cons:
       - likely slower, perhaps costlier
    
   
   Being that I used the object store in a previous iteration for DFRay that 
didn't make it into this repo, I have experience with the first option and 
would want to try that first.  The second option could be a future user facing 
choice.
   
   
   Priority wise, I'd like to release `0.1.0` first and then add this 
functionality.  WDYT


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [I] Add Spark-like scheduling mode with shuffle files between stages [datafusion-ray]

Reply via email to