tobixdev commented on issue #12622: URL: https://github.com/apache/datafusion/issues/12622#issuecomment-2662684630
Thanks for having a look at the PR (#14617, mentioning here for reference) and your input on this approach @jayzhan211 ! I agree that adding this type of complexity simply for eliminating `try_as_str` etc. is not worthwhile. However, I think that an approach with `LogicalScalar` can provide benefits in the long run where something similar to `PhysicalScalar(DataType, LogicalScalar)` replaces `ScalarValue`. Basically the result of this transition would be equal to `Scalar` and `ScalarValue` in the original approach but with a different migration process (not adapting `ScalarValue`, but replacing it step by step with either `LogicalScalar` or `PhysicalScalar`). > I found LogicalScalar to be a largely duplicated concept with only minor differences from ScalarValue, mainly aimed at reducing complexity. However, the reduction is minimal—for instance, eliminating try_as_str alone doesn’t seem to justify introducing LogicalScalar. Here are the benefits that I'd see in `LogicalScalar` (I have not checked whether these are implementable): - Removing the Option from variants (e.g., `ScalarValue::UInt8(None)`) that we *may* not need in a logical setting (see the example of #14617 where the `None` cases can be removed). I am relying here that we can extract the necessary type information from the schema or record batches. - Removing variants that only carry physical type information (e.g., `ScalarValue::LargeUtf8` and `ScalarValue::Dictionary`) - Work with well-known rust types instead of arrow arrays (e.g., `HashMap` instead of `StructArray` in `ScalarValue::Struct`) for scalars. *Maybe* this can make implementing UDFs easier. Some benefits that could become relevant in the future and may be more difficult to implement on `ScalarValue`: - Implementing Scalars of extension types by implementing a trait. For example, this could allow users to use rust enums for modelling scalars of arrow unions. However, these are probably subjective and some of them are certainly underexplored so I can only make a guess that they would result in a net benefit. > Casting yes (eg for constant folding) I think with `LogicalScalar` and `PhyiscalScalar` most of the regular casting logic would only be required in `LogicalScalar` while casting to arrow arrays would be part of `PhysicalScalar`. > In summary, I think we should keep ScalarValue as it is. And utilize NativeType or TypeClass (similar to TypeSignatureClass but not for Signature only). I think you're right that we can get some of these benefits by using `NativeType`. If we believe that `LogicalScalar` is a step in the wrong direction I can close #14617 and we can explore different avenues. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org