adriangb commented on PR #16501: URL: https://github.com/apache/datafusion/pull/16501#issuecomment-2994331327
So from my investigation what I *think* is happening is that https://github.com/apache/datafusion/pull/15770 fundamentally converted the TopK operation from being isolated per partition to having shared state via the dynamic filter. This causes some non-determinism with test runs since partitions can interact. I think this doesn't cause actual issues with queries, but the tests are picking it up. But I'm not 100% sure about that. @Dandandan and I were already talking about having a shared TopK heap between partitions, I think that would resolve the issue. But otherwise more investigation is needed. FWIW the TopK dynamic filters still work without this code - it's just using the filter to filter rows in the TopK operator itself that doesn't work. This is all I had time for today. I think more work is needed before we can merge this PR in the current state. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org