Re: [I] [EPIC] Decouple logical from physical types [datafusion]

via GitHub Mon, 17 Feb 2025 02:35:16 -0800


tobixdev commented on issue #12622:
URL: https://github.com/apache/datafusion/issues/12622#issuecomment-2662684630


   Thanks for having a look at the PR (#14617, mentioning here for reference) 
and your input on this approach @jayzhan211 !
   
   I agree that adding this type of complexity simply for eliminating 
`try_as_str` etc. is not worthwhile. However, I think that an approach with 
`LogicalScalar` can provide benefits in the long run where something similar to 
`PhysicalScalar(DataType, LogicalScalar)` replaces `ScalarValue`. Basically the 
result of this transition would be equal to `Scalar` and `ScalarValue` in the 
original approach but with a different migration process (not adapting 
`ScalarValue`, but replacing it step by step with either `LogicalScalar` or 
`PhysicalScalar`). 
   
   > I found LogicalScalar to be a largely duplicated concept with only minor 
differences from ScalarValue, mainly aimed at reducing complexity. However, the 
reduction is minimal—for instance, eliminating try_as_str alone doesn’t seem to 
justify introducing LogicalScalar.
   
   Here are the benefits that I'd see in `LogicalScalar` (I have not checked 
whether these are implementable):
   - Removing the Option from variants (e.g., `ScalarValue::UInt8(None)`) that 
we *may* not need in a logical setting (see the example of #14617 where the 
`None` cases can be removed). I am relying here that we can extract the 
necessary type information from the schema or record batches.
   - Removing variants that only carry physical type information (e.g., 
`ScalarValue::LargeUtf8` and `ScalarValue::Dictionary`)
   - Work with well-known rust types instead of arrow arrays (e.g., `HashMap` 
instead of `StructArray` in `ScalarValue::Struct`) for scalars. *Maybe* this 
can make implementing UDFs easier.
   
   Some benefits that could become relevant in the future and may be more 
difficult to implement on `ScalarValue`:
   - Implementing Scalars of extension types by implementing a trait. For 
example, this could allow users to use rust enums for modelling scalars of 
arrow unions.
   
   However, these are probably subjective and some of them are certainly 
underexplored so I can only make a guess that they would result in a net 
benefit.
   
   > Casting yes (eg for constant folding)
   
   I think with `LogicalScalar` and `PhyiscalScalar` most of the regular 
casting logic would only be required in `LogicalScalar` while casting to arrow 
arrays would be part of `PhysicalScalar`. 
   
   > In summary, I think we should keep ScalarValue as it is. And utilize 
NativeType or TypeClass (similar to TypeSignatureClass but not for Signature 
only).
   
   I think you're right that we can get some of these benefits by using 
`NativeType`. If we believe that `LogicalScalar` is a step in the wrong 
direction I can close #14617 and we can explore different avenues. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [I] [EPIC] Decouple logical from physical types [datafusion]

Reply via email to