wiedld opened a new issue, #14029:
URL: https://github.com/apache/datafusion/issues/14029

   ### Is your feature request related to a problem or challenge?
   
   As part of the work to [automatically check 
invariants](https://github.com/apache/datafusion/issues/13652) for the logical 
and execution plans, we have provided infrastructure to run an invariant 
checker. This invariant checker runs at limited time points in order to not 
degrade planning performance; e.g. after all optimizations are completed. In 
debug mode, it runs these checks more often and can therefore help to quickly 
isolate at which point (e.g. which specific optimizer run) make the plan become 
invalid.
   
   We want to also enable users to add their own invariants. Users are already 
able to add their own Logical and Execution plan extensions, as well as their 
own optimization runs which modify these plans. Therefore it may be useful for 
an invariant extension interface for user-defined invariants. e.g. If a change 
in Datafusion core's optimizer passes will cause a problem in a user-defined 
Logical plan extension, then the user could define an invariant based upon what 
their Logical plan extension requires.
   
   Refer to specific examples [in this 
conversation](https://github.com/apache/datafusion/pull/13651#discussion_r1873973604),
 for plan extensions which have their own invariants. For the example case of 
our own `ProgressiveEval` -- we require the input partition streams to have 
specific sort orders, non-overlapping column ranges, and no pushdown of the 
offset [(issue)](https://github.com/apache/datafusion/issues/12423) in order to 
provide the correct result. An invariant check, performed after each optimizer 
run (while in debug mode), would enable us to quickly isolate the problem 
during DF upgrade.
   
   (We have several other, more complex, examples of how changes in the 
optimization of UNIONs has produced invalid plans for our 
`SortPreservingMerge`. So this is not a one-off example, the above is merely 
the simplest concrete example.)
   
   ### Describe the solution you'd like
   
   Take the existing invariant infrastructure provided as part of [this 
issue](https://github.com/apache/datafusion/issues/13652#issuecomment-2573659546),
 and provide extension points for users to define their own invariants.
   
   ### Describe alternatives you've considered
   
   * Alternative 1: for a user-defined Execution plan extension, have a runtime 
check of invariants be performed.
      * Con: this detects problems after planning time, thereby increasing both 
time-until-error as well as resource utilization.
     
   * Alternative 2: for either Logical or Physical plan extensions, the user 
can define an optimization run which is intended to detect invariant violations 
which are in conflict with their plan extensions.
      * Pro: can detect invariant violation at planning time
      * Con: arguably more code code complexity:
         * in order to isolate exactly which plan mutation (Datafusion core 
change) caused the problem, it would need to be coded to run after each 
optimizer pass.
   
   
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to