GaneshPatil7517 opened a new pull request, #19688:
URL: https://github.com/apache/datafusion/pull/19688
## Which issue does this PR close?
Closes #19511
Related to #18882
## Rationale for this change
Currently, `AggregateUDFImpl::is_nullable()` returns `true` by default for
all UDAFs, regardless of input characteristics. This is not ideal because:
1. The same nullability information is already encoded in `return_field()`
2. Most aggregate functions should only be nullable if their inputs are
nullable (e.g., `MIN`, `MAX`, `SUM`)
3. This pattern doesn't align with scalar UDFs, which already use
`return_field_from_args()` for nullability
## What changes are included in this PR?
### Core Changes
- **Deprecated `is_nullable()`** on `AggregateUDFImpl` trait with migration
guidance
- **Updated `udaf_default_return_field()`** to compute nullability from
input fields:
- Output is nullable if ANY input field is nullable
- Output is non-nullable only if ALL inputs are non-nullable
### Tests
Added 4 new tests validating nullability inference:
- `test_return_field_nullability_from_nullable_input`
- `test_return_field_nullability_from_non_nullable_input`
- `test_return_field_nullability_with_mixed_inputs`
- `test_return_field_preserves_return_type`
### Documentation
- New `docs/source/library-user-guide/functions/udf-nullability.md` with
migration guide and examples
- Updated `adding-udfs.md` with reference to nullability documentation
## Are these changes tested?
Yes. All existing tests pass, plus 4 new tests specifically for nullability
behavior.
## Are there any user-facing changes?
**Deprecation warning**: Users implementing `is_nullable()` will see a
deprecation warning directing them to use `return_field()` instead.
**Behavioral change**: Default nullability now depends on input field
nullability rather than always returning `true`. Functions like `COUNT` that
need to always return non-nullable should override `return_field()`.
This is a potentially breaking change for users who rely on the previous
behavior of always-nullable outputs, but the new behavior is more correct and
aligns with scalar UDF patterns.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]