crm26 opened a new pull request, #22013: URL: https://github.com/apache/datafusion/pull/22013
## Which issue does this PR close? Part of #21536 — split of #21371 into one-function-per-PR. Third in the series after #21542 (cosine_distance) and #21861 (inner_product). ## Rationale for this change Adds `array_normalize(array)` — the L2-normalized version of a numeric input vector. Computed as `array[i] / sqrt(sum(array[i]^2))` per element. Returns the same shape as the input (`List<Float64>` or `LargeList<Float64>`). Aliased as `list_normalize` to match the `array_X`/`list_X` convention used across the crate. ## What changes are included in this PR? Coercion shell mirrors the merged cosine_distance/inner_product pattern: - `coerce_types` accepts `List`/`LargeList`/`FixedSizeList` of any numeric inner type, plus bare `NULL`. After coercion the inner function only sees `List(Float64)` or `LargeList(Float64)`. - Per-row L2 norm computed inline (no shared module), using a single `as_float64_array(list_array.values())` downcast plus `value_offsets()` slicing — no per-row downcasts. - Manual list builder: `Vec<f64>` for values, `Vec<O>` for offsets, `NullBuffer` for row validity. Per-row semantics: - NULL row → NULL output - NULL element in list → NULL row - Empty list → empty list (no division-by-zero hazard) - Zero magnitude → NULL row (consistent with cosine_distance's zero-magnitude → NULL) - Otherwise → divide each element by `sqrt(sum-of-squares)` ## Are these changes tested? Yes. SLT covers: - 3-4-5 right triangle, 3D vector, already-unit-axis, single non-zero component, negative components - Bare `NULL` input, NULL element in list, zero vector, empty array - `LargeList`, `FixedSizeList` (via coercion), `Float32` and `Int64` inner types, integer literals - Multi-row query mixing normal / NULL row / zero-vector row / null-element row - Plan error for non-list input - No-args error - Return-type assertion (`List(Float64)`) - `list_normalize` alias coverage (constant + multi-row with NULL) ## Are there any user-facing changes? New scalar function `array_normalize` (alias `list_normalize`), documented in `docs/source/user-guide/sql/scalar_functions.md`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
