Re: [PR] Upgrade to DataFusion 43, fix a bug, add more tests [datafusion-ray]

via GitHub Sat, 14 Dec 2024 10:21:46 -0800


andygrove commented on code in PR #53:
URL: https://github.com/apache/datafusion-ray/pull/53#discussion_r1885291564



##########
src/query_stage.rs:
##########
@@ -99,10 +99,14 @@ impl QueryStage {
     /// Get the input partition count. This is the same as the number of 
concurrent tasks
     /// when we schedule this query stage for execution
     pub fn get_input_partition_count(&self) -> usize {

Review Comment:
   in `context.py` we have this logic:
   
   ```
       # if the query stage has a single output partition then we need to 
execute for the output
       # partition, otherwise we need to execute in parallel for each input 
partition
       concurrency = stage.get_input_partition_count()
       output_partitions_count = stage.get_output_partition_count()
   ```
   
   This is based on the assumption that the query stage is a shuffle write, 
which perhaps was always true when running TPC-H, so the existing code worked.
   
   With the new simple `SELECT * FROM table` test that you added, we had a 
query stage where the plan was a `CsvExec` and had no children so we had to 
handle this as a special case. There is no input partitioning in this case. We 
use the output partitioning because DataFusion will have already decided that 
based on the files that are available.
   
   This code is all confusing and I would like to make it less so. 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Upgrade to DataFusion 43, fix a bug, add more tests [datafusion-ray]

Reply via email to