ozankabak opened a new pull request, #16217: URL: https://github.com/apache/datafusion/pull/16217
## Which issue does this PR close? - Closes #13748. ## Rationale for this change DF's theoretical approach to representing orderings, equivalences and constants is sound, but it suffers from many implementation problems that lead to various issues. Some of these issues include: - High-complexity planning times for queries involving UNION and ORDER BY clauses (see the referred issue). - A continuous stream of subtle bugs that arise from the fact that our objects allow multiple representations of the same underlying state (e.g. a non-existing ordering is sometimes represented by a `None` value for a type `Option<LexOrdering>`, and sometimes with a `Some` value with the payload being "empty"). - The inability to relay ordering preference information to optimizer rules such as `EnforceSorting`, which results in missed optimization opportunities. For example, a windowing operator prefers its input to be sorted w.r.t. both PARTITION BY and ORDER BY keys, but if the former is not available, it is still preferable to have just the latter instead of a fully unsorted input. ## What changes are included in this PR? - The equivalence system is rewritten to solve the computational complexity problem. The new system avoids handling constants separately from other equivalences and treats them on an equal footing within the same mechanism, avoiding unnecessary work during `ordering_satisfy` checks. Also, the new system caches normalization results for equivalent orderings, which was one of the major sources of excessive computational complexity during sort enforcement. - The `LexOrdering` and `LexRequirement` objects do not allow degenerate orderings anymore, whenever you see this object, you can be sure there will be *some* ordering. Possible absence of an ordering will henceforth be properly represented by an `Option<LexOrdering>`. - The `OrderingRequirements` object is introduced to capture ordering preferences of operators. ## Are these changes tested? Yes. ## Are there any user-facing changes? Yes -- this is a major API change and will break old code. We spent months looking for ways to (1) break this PR down into smaller ones, (2) yet lump all API changes in one PR -- but couldn't find an effective way to do this. Therefore, I invite all contributors who are also downstream stakeholders to help by participating in a "pre-scream" test. Please test and report how this PR breaks your code, so we can collectively prepare an upgrade guide for the community at large. @alamb, @andygrove, @berkaysynnada -- feel free to tag others. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org