kosiew opened a new pull request, #20541:
URL: https://github.com/apache/datafusion/pull/20541
## Which issue does this PR close?
* This implements `ceil` part of #20197.
---
## Rationale for this change
DataFusion’s preimage framework can turn predicates on deterministic
functions into equivalent predicates on the underlying column(s). For
`ceil(x)`, the mathematical preimage for a target integer value `N` is the
interval **(N − 1, N]**.
Without a preimage implementation, filters such as `WHERE ceil(col) = 6`
must evaluate `ceil` for every row, which can inhibit predicate pushdown and
other optimizer wins.
This PR implements `ceil`’s preimage to enable rewriting comparisons into
simple range predicates (with careful handling of floating-point representation
boundaries and decimals), improving the optimizer’s ability to push filters
down to scans and reduce work during execution.
---
## What changes are included in this PR?
* Implemented `ScalarUDFImpl::preimage` for the `ceil` scalar function.
* Computes the preimage range for `ceil(x) = N` as a half-open interval
suitable for the `Interval` framework.
* Uses `next_up` for floating-point bounds so that the strict lower bound
`(N-1, …]` is represented safely as `x >= next_up(N-1)` and the inclusive upper
bound `… <= N` becomes `x < next_up(N)`.
* Rejects non-integer literals (no solutions) and non-finite float
literals (NaN/±Inf).
* Avoids unsafe rewrites when `N - 1` collapses to `N` due to float
spacing (e.g., above `2^53` for `f64`, above `2^24` for `f32`).
* Added decimal preimage support for `Decimal32/64/128/256`.
* Validates that the literal has no fractional part at the declared scale.
* Computes bounds using the decimal unit at the target scale (step =
`10^-scale`) to represent `(N-1, N]` as `[N-1+step, N+step)`.
* Handles scale 0 (integer decimals) as `[N, N+1)`.
* Added unit tests covering:
* Valid ranges for floats (positive/negative/zero), integers, and decimals.
* Non-integer literals returning `PreimageResult::None`.
* Overflow and float boundary conditions.
* NULL literals.
* Added a new SQLLogicTest file `ceil_preimage.slt`.
* Verifies correctness of results for representative types (Float64, Int32
via coercion, Decimal).
* Verifies optimizer rewrites via `EXPLAIN` for `=`, `IN`, `IS [NOT]
DISTINCT FROM`, and boundary cases.
---
## Are these changes tested?
Yes.
* **Rust unit tests** added in `datafusion/functions/src/math/ceil.rs`
validate:
* Correct range generation for supported scalar types.
* Correct rejection of non-integer / non-finite float literals.
* Overflow and precision boundary handling.
* Decimal scale/precision behavior and NULL handling.
* **SQLLogicTest** added in
`datafusion/sqllogictest/test_files/ceil_preimage.slt` validates:
* Query result correctness for rewritten predicates.
* Logical plan rewrites using `EXPLAIN` (including float `next_up` bounds
and decimal bounds).
---
## Are there any user-facing changes?
No user-visible behavior changes are intended. The semantics of `ceil` are
unchanged.
This is an optimizer improvement that may:
* Produce different (but equivalent) logical plans when predicates involve
`ceil`.
* Improve performance for queries that filter on `ceil(col)` by enabling
range filtering and better predicate pushdown.
No documentation updates are required.
---
## LLM-generated code disclosure
This PR includes LLM-generated code and comments. All LLM-generated content
has been manually reviewed and tested.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]