cyberbeam524 commented on code in PR #23169:
URL: https://github.com/apache/datafusion/pull/23169#discussion_r3510483754
##########
datafusion/expr/src/expr_schema.rs:
##########
@@ -69,18 +70,46 @@ pub trait ExprSchemable {
-> Result<(DataType, bool)>;
}
-/// Derives the output field for a cast expression from the source field.
+/// Derives the output field for a cast expression from the source and target
fields.
+///
+/// Metadata handling:
+/// - Type-only casts (i.e., target_field ==
DataType::SomeDataType.into_nullable_field())
+/// propagate non extension-type metadata from the source. This is for
backward compatibility
+/// (casts have propagated source metadata for many if not all previous
versions), recognizing
+/// that the return type of `<some extension type>::<some non extension
type>` should have the
+/// return type of `<some non extension type>` (e.g., casting arrow.json to
utf8).
+/// - All other casts preserve target metadata exactly. This ensures in
particular that output
+/// metadata when casting to an extension type contains the extension
information in the
+/// output field. Callers that wish to have some mix of source and target
metadata can use
+/// Alias or construct an output field themselves (whose metadata will be
used directly).
+///
/// For `TryCast`, `force_nullable` is `true` since a failed cast returns NULL.
fn cast_output_field(
source_field: &FieldRef,
- target_type: &DataType,
+ target_field: &FieldRef,
force_nullable: bool,
) -> Arc<Field> {
+ // Check if this is a "type-only" cast (target_field ==
DataType::X.into_nullable_field())
+ let is_type_only = target_field.name().is_empty()
+ && target_field.is_nullable()
+ && target_field.metadata().is_empty();
+
+ let metadata = if is_type_only {
+ // Type-only cast: propagate source metadata, stripping extension type
keys
+ let mut meta = source_field.metadata().clone();
+ meta.remove(EXTENSION_TYPE_NAME_KEY);
+ meta.remove(EXTENSION_TYPE_METADATA_KEY);
+ meta
+ } else {
+ // Explicit target field: use target metadata exactly
+ target_field.metadata().clone()
+ };
+
let mut f = source_field
.as_ref()
.clone()
- .with_data_type(target_type.clone())
- .with_metadata(source_field.metadata().clone());
+ .with_data_type(target_field.data_type().clone())
+ .with_metadata(metadata);
if force_nullable {
f = f.with_nullable(true);
}
Review Comment:
Thanks for pulling this together, @paleolimbot! For this part, from my
limited understanding, it forces force_nullable to true here because a parsing
failure at runtime produces a NULL rather than a query panic. Even though it
forces this safety guard at the logical layer, do we risk any edge cases where
intermediate query optimizer rules collapse or rearrange projections before
reaching the physical tier, accidentally reading the user's non-nullable target
metadata too early and ultimately causing a runtime panic or query panic?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]