andygrove commented on issue #15676: URL: https://github.com/apache/datafusion/issues/15676#issuecomment-2796920220
Sorry to cause so much work for everyone discussing this 😞 For order-preserved input, such as single-partition & single-thread, DataFusion has implemented the same behavior as Spark for the past ~1 year. In the past week, the behavior changed. It isn't necessarily "wrong," but it is a breaking change for some downstream users. Comet's test suites run the same query against Spark and Comet/DataFusion and compare the results. The tests are mostly deterministic. Upgrading to the soon-to-be DataFusion 47 causes test failures, hence I reported this issue. In the Comet project, we now have the following choices: - 1. Rewrite our tests for `LAST` to stop comparing to Spark and implement some other means to determine that the behavior is correct, and also document that Comet is not compatible with Spark in some cases - 2. Fork the `LAST` implementation and maintain it in Comet - 3. See if there are options for DataFusion to support the order-preserved case Any one of these options can work. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org