jayzhan211 commented on code in PR #12853:
URL: https://github.com/apache/datafusion/pull/12853#discussion_r1816904255
##########
datafusion/common/src/types/native.rs:
##########
@@ -188,6 +190,147 @@ impl LogicalType for NativeType {
fn signature(&self) -> TypeSignature<'_> {
TypeSignature::Native(self)
}
+
+ fn default_cast_for(&self, origin: &DataType) -> Result<DataType> {
+ use DataType::*;
+
+ fn default_field_cast(to: &LogicalField, from: &Field) ->
Result<FieldRef> {
+ Ok(Arc::new(Field::new(
+ to.name.clone(),
+ to.logical_type.default_cast_for(from.data_type())?,
+ to.nullable,
+ )))
+ }
+
+ Ok(match (self, origin) {
+ (Self::Null, _) => Null,
+ (Self::Boolean, _) => Boolean,
+ (Self::Int8, _) => Int8,
+ (Self::Int16, _) => Int16,
+ (Self::Int32, _) => Int32,
+ (Self::Int64, _) => Int64,
+ (Self::UInt8, _) => UInt8,
+ (Self::UInt16, _) => UInt16,
+ (Self::UInt32, _) => UInt32,
+ (Self::UInt64, _) => UInt64,
+ (Self::Float16, _) => Float16,
+ (Self::Float32, _) => Float32,
+ (Self::Float64, _) => Float64,
+ (Self::Decimal(p, s), _) if p <= &38 => Decimal128(*p, *s),
+ (Self::Decimal(p, s), _) => Decimal256(*p, *s),
+ (Self::Timestamp(tu, tz), _) => Timestamp(*tu, tz.clone()),
+ (Self::Date, _) => Date32,
+ (Self::Time(tu), _) => match tu {
+ TimeUnit::Second | TimeUnit::Millisecond => Time32(*tu),
+ TimeUnit::Microsecond | TimeUnit::Nanosecond => Time64(*tu),
+ },
+ (Self::Duration(tu), _) => Duration(*tu),
+ (Self::Interval(iu), _) => Interval(*iu),
+ (Self::Binary, LargeUtf8) => LargeBinary,
+ (Self::Binary, Utf8View) => BinaryView,
+ (Self::Binary, _) => Binary,
+ (Self::FixedSizeBinary(size), _) => FixedSizeBinary(*size),
+ (Self::Utf8, LargeBinary) => LargeUtf8,
+ (Self::Utf8, BinaryView) => Utf8View,
+ (Self::Utf8, _) => Utf8,
Review Comment:
Some initial thoughts
First, I think utf8view is the default type for string.
In my proposed way,
I think the mapping should be based on `arrow::can_cast_to` so if the 1st
target type is not supported we go to the next one.
But in this design, each type has a single target type to cast to. Is the
default type always castable by arrow's `can_cast_to`? Maybe we can just add
the check inside. Like
```rust
(Self::Utf8, data_type) if can_cast_type(data_type, Utf8View)=> Utf8View,
(Self::Utf8, data_type) if can_cast_type(data_type, LargeUtf8)=> LargeUtf8,
(Self::Utf8, data_type) if can_cast_type(data_type, Utf8)=> Utf8,
```
Another issue is whether the default type is great for every use case, for
example, I think utf8view should be the default type but is there any usecase
that require utf8 instead? If yes, how can they modify it? It seems they need
to copy the whole `default_cast_for` and change it for their logical type. (We
can assume no such case for now, but just to point out that it may not work if
there is such a case)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]