davidlghellin opened a new pull request, #22959: URL: https://github.com/apache/datafusion/pull/22959
## Which issue does this PR close? - NA ## Rationale for this change `concat_ws` plans containing a literal-NULL separator can be folded at plan time, but DataFusion's existing `ConstEvaluator` only fires when **all** arguments are literals. When the separator is a literal NULL but the value arguments are columns, the call survives until execution and runs per row to produce NULL each time — pure overhead that downstream projection pushdown can't reclaim because the columns are still referenced. ## What changes are included in this PR? A new `ScalarUDFImpl::simplify` implementation on `SparkConcatWs` with two rules: 1. **`concat_ws(NULL_literal, …)`** → `NULL::Utf8`. A literal-NULL separator yields NULL regardless of the value args, including when they are columns. This is the case `ConstEvaluator` can't reach. 2. **`concat_ws(sep_literal)`** (only the separator, no value args) → `''`. Mostly redundant with `ConstEvaluator` for the all-literal case; kept for symmetry. The runtime physical execution path (`invoke_with_args`, `only_separator`, `spark_concat_ws`, the `StringView`/`ArgView` enums, and the `write_list_row`/`push_part` helpers) is **unchanged**. ## Are these changes tested? Yes. New SLT cases in `datafusion/sqllogictest/test_files/spark/string/concat_ws.slt` are grouped into four labelled categories: ## Are there any user-facing changes? No API or behaviour changes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
