alamb commented on PR #20820:
URL: https://github.com/apache/datafusion/pull/20820#issuecomment-4117495652

   I have implemented the work stealing scheduler idea and while it seems to 
show promise it still clearly is not ready (given the results above)
   
   <img width="1994" height="890" alt="Screenshot 2026-03-24 at 7 24 53 AM" 
src="https://github.com/user-attachments/assets/12e32d37-ba59-457c-b6de-bde63237a64e";
 />
   <img width="2252" height="876" alt="Screenshot 2026-03-24 at 7 26 13 AM" 
src="https://github.com/user-attachments/assets/058c18f1-a470-46af-bd59-b7ed5819e94d";
 />
   <img width="2246" height="855" alt="Screenshot 2026-03-24 at 7 30 16 AM" 
src="https://github.com/user-attachments/assets/f798182c-a9a7-4671-8dd6-29877646a604";
 />
   
   I spent quite a long time messing with Q23 and saw widely varying results. I 
think this is due to the fact that Q23 is very sensitive to the order in which 
the files are processed (e.g. top-k / dynamic filtering)
   
   However, I have also observed some flakiness in running tests and I think 
that is because some plans require a certain partitioning (e.g. to ensure data 
is passed across streams) and so having the FileStream process
   data across multiple partitions in this case causes incorrectness errors. 
   
   My plan is to ensure we don't enable work stealing for plans that require 
data not to cross partitions
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to