EeshanBembi commented on issue #18001:
URL: https://github.com/apache/datafusion/issues/18001#issuecomment-3405031849
I've investigated the codebase and identified where the defensive debug
assertions should be added.
The key location is in
https://github.com/apache/datafusion/blob/main/datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs#L1579,
where udf.simplify() is called for ScalarFunction expressions.
The assertion should check that both the data type and nullability remain
unchanged after
simplification, using the SimplifyInfo trait's get_data_type() and
nullable() methods that are
already available in the context.
@pepijnve is absolutely right about the coalesce issue. Adding this
assertion will indeed surface
the #17813 problem immediately, which is actually a good thing, it will
force us to properly fix
the underlying schema contract violations rather than allowing them to
silently persist.
Proposed approach:
1. Add the debug assertion to catch schema changes during simplification
2. Fix the coalesce nullability issue (as addressed in #17813)
3. Audit other UDF simplify implementations to ensure they maintain schema
consistency
The debug assertion would look something like:
```
debug_assert!(
original_data_type == simplified_data_type &&
original_nullable == simplified_nullable,
"UDF simplify() must not change schema (data type or nullability). \
Original: {:?}, Simplified: {:?}. See issue #18001",
original_expr, simplified_expr
);
```
This approach will help us maintain the simplify contract while
identifying violations during
development.
Benefits:
- Catches schema violations early in development
- Zero runtime cost in release builds
- Forces proper fixes rather than masking issues
- Maintains backward compatibility
Would you like me to prepare a PR with this implementation?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]