alamb commented on code in PR #14440: URL: https://github.com/apache/datafusion/pull/14440#discussion_r1954707280
########## datafusion/common/src/types/native.rs: ########## @@ -198,6 +198,11 @@ impl LogicalType for NativeType { TypeSignature::Native(self) } + /// Returns the default casted type for the given arrow type Review Comment: 😍 ########## datafusion/functions/src/string/repeat.rs: ########## @@ -65,10 +65,17 @@ impl Default for RepeatFunc { impl RepeatFunc { pub fn new() -> Self { Self { - signature: Signature::coercible( + signature: Signature::coercible_v2( vec![ - TypeSignatureClass::Native(logical_string()), - TypeSignatureClass::Native(logical_int64()), Review Comment: I do like how much more explicit the new formualtion is ########## datafusion/expr-common/src/signature.rs: ########## @@ -431,6 +463,35 @@ impl TypeSignature { } } +fn get_possible_types_from_signature_classes( Review Comment: I agree with @shehabgamin on both points ########## datafusion/expr-common/src/signature.rs: ########## @@ -431,6 +463,35 @@ impl TypeSignature { } } +fn get_possible_types_from_signature_classes( + signature_classes: &TypeSignatureClass, +) -> Vec<DataType> { + match signature_classes { + TypeSignatureClass::Native(l) => get_data_types(l.native()), + TypeSignatureClass::Timestamp => { + vec![ + DataType::Timestamp(TimeUnit::Nanosecond, None), + DataType::Timestamp(TimeUnit::Nanosecond, Some(TIMEZONE_WILDCARD.into())), + ] + } + TypeSignatureClass::Date => { + vec![DataType::Date64] + } + TypeSignatureClass::Time => { + vec![DataType::Time64(TimeUnit::Nanosecond)] Review Comment: I think it is ok to have one example type as this is only used for inforamtin schema -- the naming is super confusing though -- I think renaming the functions would make it much better ########## datafusion/expr-common/src/signature.rs: ########## @@ -365,7 +365,12 @@ impl TypeSignature { } } - /// get all possible types for the given `TypeSignature` + /// This function is used specifically internally for `information_schema` + /// We suggest not to rely on this function + /// + /// Get all possible types for `information_schema` from the given `TypeSignature` + // + // TODO: Make this function private Review Comment: we could potentially mark it as deprecated and make the new (non pub) version with a different name Maybe something like this (can do in a follow on PR) ```rust pub fn get_example_types(&self) -> Vec<Vec<DataType>> { ``` ########## datafusion/expr-common/src/signature.rs: ########## @@ -431,6 +463,35 @@ impl TypeSignature { } } +fn get_possible_types_from_signature_classes( Review Comment: Suggested name: ```suggestion fn get_example_types( ``` And we can put it on TypeSignatureClass ########## datafusion/expr/src/type_coercion/functions.rs: ########## @@ -596,75 +594,93 @@ fn get_valid_types( vec![vec![target_type; *num]] } } - TypeSignature::Coercible(target_types) => { - function_length_check( - function_name, - current_types.len(), - target_types.len(), - )?; - - // Aim to keep this logic as SIMPLE as possible! - // Make sure the corresponding test is covered - // If this function becomes COMPLEX, create another new signature! - fn can_coerce_to( - function_name: &str, - current_type: &DataType, - target_type_class: &TypeSignatureClass, - ) -> Result<DataType> { - let logical_type: NativeType = current_type.into(); + TypeSignature::Coercible(param_types) => { + function_length_check(function_name, current_types.len(), param_types.len())?; - match target_type_class { - TypeSignatureClass::Native(native_type) => { - let target_type = native_type.native(); - if &logical_type == target_type { - return target_type.default_cast_for(current_type); - } + let mut new_types = Vec::with_capacity(current_types.len()); + for (current_type, param) in current_types.iter().zip(param_types.iter()) { + let current_logical_type: NativeType = current_type.into(); + + fn is_matched_type( Review Comment: this function also seems like it would make more sense as a method on `TypeSignatureClass` ########## datafusion/expr-common/src/signature.rs: ########## @@ -460,6 +521,44 @@ fn get_data_types(native_type: &NativeType) -> Vec<DataType> { } } +#[derive(Debug, Clone, Eq, PartialOrd)] +pub struct Coercion { Review Comment: > @jayzhan211 , I believe @alamb 's question (please correct me if I'm wrong) is about creating functionality for a downstream user to override the default signature of a UDF in order to provide their own coercion rules. This is what I was getting at What do you think about making `Coercion` an `enum` like this (I can make a follow on PR / propose changes to this PR): ```rust #[derive(Debug, Clone, Eq, PartialOrd)] pub enum Coercion { /// the argument type must match desired_type Exact { desired_type: TypeSignatureClass, } /// The argument type must be coercable to the desired type using the implicit coercion Implict { desired_type: TypeSignatureClass, implicit_coercion: ImplicitCoercion, } } ``` Then we can eventually maybe add a "user defined coercion" variant Let me make a PR to see what this would look like. ########## datafusion/expr-common/src/signature.rs: ########## @@ -431,6 +463,35 @@ impl TypeSignature { } } +fn get_possible_types_from_signature_classes( + signature_classes: &TypeSignatureClass, +) -> Vec<DataType> { + match signature_classes { + TypeSignatureClass::Native(l) => get_data_types(l.native()), + TypeSignatureClass::Timestamp => { + vec![ Review Comment: > Is listing all possible DataType combination helpful? I don't think this is helpful to be honest -- just the combination of classes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org