ozankabak opened a new pull request, #16217:
URL: https://github.com/apache/datafusion/pull/16217

   ## Which issue does this PR close?
   
   - Closes #13748.
   
   ## Rationale for this change
   
   DF's theoretical approach to representing orderings, equivalences and 
constants is sound, but it suffers from many implementation problems that lead 
to various issues. Some of these issues include:
   - High-complexity planning times for queries involving UNION and ORDER BY 
clauses (see the referred issue).
   - A continuous stream of subtle bugs that arise from the fact that our 
objects allow multiple representations of the same underlying state (e.g. a 
non-existing ordering is sometimes represented by a `None` value for a type 
`Option<LexOrdering>`, and sometimes with a `Some` value with the payload being 
"empty").
   - The inability to relay ordering preference information to optimizer rules 
such as `EnforceSorting`, which results in missed optimization opportunities. 
For example, a windowing operator prefers its input to be sorted w.r.t. both 
PARTITION BY and ORDER BY keys, but if the former is not available, it is still 
preferable to have just the latter instead of a fully unsorted input.
   
   ## What changes are included in this PR?
   
   - The equivalence system is rewritten to solve the computational complexity 
problem. The new system avoids handling constants separately from other 
equivalences and treats them on an equal footing within the same mechanism, 
avoiding unnecessary work during `ordering_satisfy` checks. Also, the new 
system caches normalization results for equivalent orderings, which was one of 
the major sources of excessive computational complexity during sort enforcement.
   - The `LexOrdering` and `LexRequirement` objects do not allow degenerate 
orderings anymore, whenever you see this object, you can be sure there will be 
*some* ordering. Possible absence of an ordering will henceforth be properly 
represented by an `Option<LexOrdering>`.
   - The `OrderingRequirements` object is introduced to capture ordering 
preferences of operators.
   
   ## Are these changes tested?
   
   Yes.
   
   ## Are there any user-facing changes?
   
   Yes -- this is a major API change and will break old code. We spent months 
looking for ways to (1) break this PR down into smaller ones, (2) yet lump all 
API changes in one PR -- but couldn't find an effective way to do this.
   
   Therefore, I invite all contributors who are also downstream stakeholders to 
help by participating in a "pre-scream" test. Please test and report how this 
PR breaks your code, so we can collectively prepare an upgrade guide for the 
community at large.
   
   @alamb, @andygrove, @berkaysynnada -- feel free to tag others.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to