notfilippo opened a new pull request, #11978:
URL: https://github.com/apache/datafusion/pull/11978
This PR removes LargeUtf8|Binary, Utf8|BinaryView, and Dictionary from
ScalarValue, following the discussion on #11513
---
## Open questions
### Top level ScalarValue cast
This change currently fails this test (and potentially others) and before
proceeding I would like to get some feedback on how to solve this issue. The
test fails because of an interaction of `expression_simplifier` +
`optimize_projections`.
The query is: `SELECT arrow_cast(1234, 'Float64') as f64, arrow_cast('foo',
'LargeUtf8') as large`
1. expression_simplifier correctly tries to evaluate_to_scalar the
`arrow_cast('foo', 'LargeUtf8')` cast
([here](https://github.com/apache/datafusion/blob/main/datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs#L626-L626))
2. this leads to the cast physical expression
([here](https://github.com/apache/datafusion/blob/main/datafusion/expr-common/src/columnar_value.rs#L187-L187)),
but this is a no-op since it transforms `Utf8('foo')` (scalar) →
`LargeUtf8('foo')` (array) → `Utf8('foo')` (scalar)
- TODO: if this is what we want this should probably not execute at all
3. This edit doesn't change the underlying schema of the LogicalPlan
4. `optimize_projections` rewrites the Projection and updates the schema,
seeing the `Utf8('foo')` it correctly assumes that the LogicalPlan's schema
field for this value should have DataType == Utf8
[This
check](https://github.com/apache/datafusion/blob/main/datafusion/optimizer/src/optimizer.rs#L471-L471)
is the one raising this error but I guess it should instead check if schema
fields are _logically_ equivalent to eachother. I'm not totally convinced this
is the correct solution because it removes some guarantees that might be
expected by users downstream. Happy to hear everyone's opinion on this.
```
---- optimizer::select_arrow_cast stdout ----
thread 'optimizer::select_arrow_cast' panicked at
datafusion/core/tests/optimizer/mod.rs:118:30:
called `Result::unwrap()` on an `Err` value: Context("Optimizer rule
'optimize_projections' failed", Context("optimize_projections",
Internal("Failed due to a difference in schemas, original schema: DFSchema {
inner: Schema { fields: [Field { name: \"f64\", data_type: Float64, nullable:
true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name:
\"large\", data_type: LargeUtf8, nullable: true, dict_id: 0, dict_is_ordered:
false, metadata: {} }], metadata: {} }, field_qualifiers: [None, None],
functional_dependencies: FunctionalDependencies { deps: [] } }, new schema:
DFSchema { inner: Schema { fields: [Field { name: \"f64\", data_type: Float64,
nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field {
name: \"large\", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered:
false, metadata: {} }], metadata: {} }, field_qualifiers: [None, None],
functional_dependencies: FunctionalDependencies { deps: [] } }")))
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]