EeshanBembi commented on issue #18001:
URL: https://github.com/apache/datafusion/issues/18001#issuecomment-3405031849

   I've investigated the codebase and identified where the defensive debug 
assertions should be added.
     The key location is in 
https://github.com/apache/datafusion/blob/main/datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs#L1579,
 where udf.simplify() is called for ScalarFunction expressions.
   
     The assertion should check that both the data type and nullability remain 
unchanged after
     simplification, using the SimplifyInfo trait's get_data_type() and 
nullable() methods that are
     already available in the context.
   
     @pepijnve is absolutely right about the coalesce issue. Adding this 
assertion will indeed surface
     the #17813 problem immediately, which is actually a good thing, it will 
force us to properly fix
     the underlying schema contract violations rather than allowing them to 
silently persist.
   
   
     Proposed approach:
   
     1. Add the debug assertion to catch schema changes during simplification
     2. Fix the coalesce nullability issue (as addressed in #17813)
     3. Audit other UDF simplify implementations to ensure they maintain schema 
consistency
   
     The debug assertion would look something like:
   
   ```
     debug_assert!(
         original_data_type == simplified_data_type &&
         original_nullable == simplified_nullable,
         "UDF simplify() must not change schema (data type or nullability). \
          Original: {:?}, Simplified: {:?}. See issue #18001",
         original_expr, simplified_expr
     );
   ```
   
     This approach will help us maintain the simplify contract while 
identifying violations during
     development.
   
     Benefits:
   
     - Catches schema violations early in development
     - Zero runtime cost in release builds
     - Forces proper fixes rather than masking issues
     - Maintains backward compatibility
   
     Would you like me to prepare a PR with this implementation?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to