tobixdev commented on issue #12622:
URL: https://github.com/apache/datafusion/issues/12622#issuecomment-2598092150

   Merging the branch with the upstream is quite a project as there are many 
merge conflicts and a lot of incompatible code was created in the meantime (on 
the other hand, it's good to see so much progress on DataFusion).
   Before investing more time in this, I'd like to start another discussion 
about whether this is the solution we are going for (kind of like a result from 
adapting the new code from main). 
   
   As I understand it, the new [`Scalar` 
struct](https://github.com/apache/datafusion/blob/logical-types/datafusion/expr-common/src/scalar.rs)
 is only necessary for carrying a physical data type for a logical value so 
that we can support operations like `arrow_cast`.
   
   Unfortunately, using `Scalar` in `ColumnarValue` and `Expr` breaks all 
patterns that try to match the scalar for these types.
   While we can fix this in the DataFusion code base (we need a nested match or 
smth similar in many cases), all downstream dependencies must rewrite their 
pattern-matching code, which can be (I assume) a huge undertaking for many 
projects.
   Here is an 
[example](https://github.com/apache/datafusion/commit/454db7e63c6d6af1c8975f29ce9b7ee9c37ea287#diff-d4a05272894f059368463815799d47bc9c33fd3f861ffcea827839f71938846fR111)
 from the current version of `logical-types`.
   
   Even this may be acceptable if the resulting solution is what we want, and 
it will remain so for the foreseeable future when considering the possible 
impact of logical types. However, being unable to match scalar types and values 
will undoubtedly impact ergonomics for some use cases. Furthermore, if we ever 
conclude that we do not need the physical type (e.g., because we can derive the 
physical type from the schema), these breaking changes could have been avoided. 
   
   Therefore, are we sure that we require the `Scalar` type? If yes, I can 
proceed with updating the PR. If not, maybe we should try to infer the 
necessary physical types from the schema to avoid this large breaking change.
   
   I also 
[experimented]https://github.com/apache/datafusion/commit/527087fbbbd04465605a3de5cebeee54e50c846c#diff-49e275af8f09685c7bbc491db8ab3b9479960878f42ac558ec0e3e39570590bdR298)
 with adding a new variant `WithPhysicalType` to `ScalarValue`. This works 
great regarding breaking changes as most match statements only consider the set 
of "supported" scalars. Unfortunately, it allows users to miss "logically 
equivalent" values with a different physical type when they only match against 
the regular variants. So I also think this is not an adequate solution (e.g., 
`ScalarValue::WithPhysicalType(Box::new(ScalarValue::Utf8("abc"), 
DataType::Utf8View)` will not be matched by `ScalarValue::Utf8("abc")`. 
However, it may be a less invasive transitional vehicle to carry the physical 
data type until we can infer the data type from the schema (and update 
optimizations etc.) to respect this.
   
   > Since the logical-types branch can easily diverge from the main branch, 
even when the sub-tasks are incomplete, would it be better to merge it into the 
main branch frequently and continue evolving it as new ideas emerge?
   
   Adopting this strategy would mean that we cannot release logical types until 
we can get rid of the `WithPhysicalType` variant, which is unfortunate. 
However, not breaking the pattern-matching capabilities of `ColumnarValue` and 
`Expr` may be worth the effort, and synching the main branch should be more 
straightforward as we also do not have that many breaking changes. 
   
   cc @notfilippo @findepi  @alamb @jayzhan211 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to