shifluxxc opened a new pull request, #19224:
URL: https://github.com/apache/datafusion/pull/19224
## Which issue does this PR close?
- Closes #19150.
## Rationale for this change
The Spark `bitwise_not` UDF always appeared as **nullable** in logical
plans, even when its input column was **non-nullable**.
This happened because the UDF implemented only `return_type()`, which
returns a `DataType` but **does not propagate nullability**.
DataFusion requires UDFs to implement `return_field_from_args()` when
nullability depends on input fields.
As a result:
- `bitwise_not(non_nullable_col)` incorrectly produced a **nullable** output.
- Downstream query planning and schema inference became inconsistent.
- This differed from both **Spark semantics** and **Arrow kernel behavior**,
where nullability is preserved.
This PR corrects the nullability inference.
## What changes are included in this PR?
- Implemented `return_field_from_args()` for the Spark `bitwise_not` UDF.
- Output type = input type
- Output nullability = input nullability
- Updated `return_type()` to return an error, per DataFusion API guidelines
when overriding nullability.
- Added unit tests verifying:
- Non-nullable input → non-nullable output
- Nullable input → nullable output
- Behavior across multiple integer types (`Int32`, `Int64`)
- Code comments and minor cleanup.
## Are these changes tested?
Yes.
This PR includes new unit tests that validate:
- correct nullability propagation
- correct output types
- consistent behavior across supported integer types
## Are there any user-facing changes?
Yes, but they are **behavior-correcting**, not breaking:
- The `spark.bitwise_not` UDF now correctly reports nullability in schemas
and logical plans.
- No API changes.
- No behavioral change for actual runtime values — Arrow kernels already
preserved null bitmaps; only planner metadata was incorrect.
This is not considered a breaking change.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]