pepijnve commented on PR #16196:
URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2931402946

   > Inserting it as a parent of leaf nodes, and only when necessary (first 
item in my message above), gives us a system where the least number of 
necessary `YieldExec`s are inserted, and at a non-arbitrary place.
   
   I think this might still be insufficient for something like this.
   
   ```
   Agg:
     Interleave:
       Agg:
         CoalesceBatches:
           Filter:
             Yield:
               Data
       Agg:
         Yield:
           Data
   ```
   
   The filter + coalesce is there to ensure the `Pending`s coming from the 
`Yield`a are not emitted in perfect lockstep.
   
   What I think can happen here is that `CombinedRecordBatchStream` created by 
`Interleave` will end up ping-ponging between the two child aggregates when 
they return `Pending` and you still get stuck in the for loop of the parent 
`Aggregate`.
   
   I did not verify this yet, but it might be worth adding a test case for this 
type of situation as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to