adriangb commented on PR #16501:
URL: https://github.com/apache/datafusion/pull/16501#issuecomment-2994331327

   So from my investigation what I *think* is happening is that 
https://github.com/apache/datafusion/pull/15770 fundamentally converted the 
TopK operation from being isolated per partition to having shared state via the 
dynamic filter. This causes some non-determinism with test runs since 
partitions can interact. I think this doesn't cause actual issues with queries, 
but the tests are picking it up. But I'm not 100% sure about that. @Dandandan 
and I were already talking about having a shared TopK heap between partitions, 
I think that would resolve the issue. But otherwise more investigation is 
needed.
   
   FWIW the TopK dynamic filters still work without this code - it's just using 
the filter to filter rows in the TopK operator itself that doesn't work.
   
   This is all I had time for today. I think more work is needed before we can 
merge this PR in the current state.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to