wiedld opened a new issue, #14029: URL: https://github.com/apache/datafusion/issues/14029
### Is your feature request related to a problem or challenge? As part of the work to [automatically check invariants](https://github.com/apache/datafusion/issues/13652) for the logical and execution plans, we have provided infrastructure to run an invariant checker. This invariant checker runs at limited time points in order to not degrade planning performance; e.g. after all optimizations are completed. In debug mode, it runs these checks more often and can therefore help to quickly isolate at which point (e.g. which specific optimizer run) make the plan become invalid. We want to also enable users to add their own invariants. Users are already able to add their own Logical and Execution plan extensions, as well as their own optimization runs which modify these plans. Therefore it may be useful for an invariant extension interface for user-defined invariants. e.g. If a change in Datafusion core's optimizer passes will cause a problem in a user-defined Logical plan extension, then the user could define an invariant based upon what their Logical plan extension requires. Refer to specific examples [in this conversation](https://github.com/apache/datafusion/pull/13651#discussion_r1873973604), for plan extensions which have their own invariants. For the example case of our own `ProgressiveEval` -- we require the input partition streams to have specific sort orders, non-overlapping column ranges, and no pushdown of the offset [(issue)](https://github.com/apache/datafusion/issues/12423) in order to provide the correct result. An invariant check, performed after each optimizer run (while in debug mode), would enable us to quickly isolate the problem during DF upgrade. (We have several other, more complex, examples of how changes in the optimization of UNIONs has produced invalid plans for our `SortPreservingMerge`. So this is not a one-off example, the above is merely the simplest concrete example.) ### Describe the solution you'd like Take the existing invariant infrastructure provided as part of [this issue](https://github.com/apache/datafusion/issues/13652#issuecomment-2573659546), and provide extension points for users to define their own invariants. ### Describe alternatives you've considered * Alternative 1: for a user-defined Execution plan extension, have a runtime check of invariants be performed. * Con: this detects problems after planning time, thereby increasing both time-until-error as well as resource utilization. * Alternative 2: for either Logical or Physical plan extensions, the user can define an optimization run which is intended to detect invariant violations which are in conflict with their plan extensions. * Pro: can detect invariant violation at planning time * Con: arguably more code code complexity: * in order to isolate exactly which plan mutation (Datafusion core change) caused the problem, it would need to be coded to run after each optimizer pass. ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org