jayzhan211 commented on issue #12622: URL: https://github.com/apache/datafusion/issues/12622#issuecomment-2663050052
> I am relying here that we can extract the necessary type information from the schema or record batches This is probably not true. Scalar is a freely constant unlike column that has DataType in the defined table. > However, these are probably subjective and some of them are certainly underexplored so I can only make a guess that they would result in a net benefit. A question I have is: When is ``LogicalScalar`` actually helpful? ``LogicalScalar`` is used in ``Expr::Literal`` and later converted into ``ScalarValue`` during physical plan execution. This adds a slight overhead, but for most types, ``LogicalScalar`` and ``ScalarValue`` are nearly identical. The key differences arise with ``Dictionary``, ``Utf8`` variants, or potentially ``REE`` in the future. Even if we can convert to `ScalarValue`, since we don't have DataType, `LogicalScalar::String` can only be converted to `ScalarValue::Utf8` but not `ScalarValue::Utf8View` or `ScalarValue::Diction(_, Utf8)`. Given `type coercion` resolve the DataType of the Expr not only the LogicalType, it is problematic if we can't convert the scalar string to the specific StringViewArray. Potential direction would be, removing the need to know the `DataType` for `Scalar`. The reason we need `DataType` for `Scalar` is being able to convert to correct ArrayRef for arrow::kernel execution. There are two possible ways to avoid this. 1) Support kernel function for Array and Scalar, like `comparison(StringViewArray, rust::String)` or `add(Dictionary(k, I32Array), i64)`. 2) Add casting mechanic before calling kernel function in physical layer. I guess these are close to what #12720 described if we don't need DataType for ScalarValue then `LogicalScalar` will be easy to bring into DataFusion -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org