cyberbeam524 commented on code in PR #23169:
URL: https://github.com/apache/datafusion/pull/23169#discussion_r3510483754


##########
datafusion/expr/src/expr_schema.rs:
##########
@@ -69,18 +70,46 @@ pub trait ExprSchemable {
     -> Result<(DataType, bool)>;
 }
 
-/// Derives the output field for a cast expression from the source field.
+/// Derives the output field for a cast expression from the source and target 
fields.
+///
+/// Metadata handling:
+/// - Type-only casts (i.e., target_field == 
DataType::SomeDataType.into_nullable_field())
+///   propagate non extension-type metadata from the source. This is for 
backward compatibility
+///   (casts have propagated source metadata for many if not all previous 
versions), recognizing
+///   that the return type of `<some extension type>::<some non extension 
type>` should have the
+///   return type of `<some non extension type>` (e.g., casting arrow.json to 
utf8).
+/// - All other casts preserve target metadata exactly. This ensures in 
particular that output
+///   metadata when casting to an extension type contains the extension 
information in the
+///   output field. Callers that wish to have some mix of source and target 
metadata can use
+///   Alias or construct an output field themselves (whose metadata will be 
used directly).
+///
 /// For `TryCast`, `force_nullable` is `true` since a failed cast returns NULL.
 fn cast_output_field(
     source_field: &FieldRef,
-    target_type: &DataType,
+    target_field: &FieldRef,
     force_nullable: bool,
 ) -> Arc<Field> {
+    // Check if this is a "type-only" cast (target_field == 
DataType::X.into_nullable_field())
+    let is_type_only = target_field.name().is_empty()
+        && target_field.is_nullable()
+        && target_field.metadata().is_empty();
+
+    let metadata = if is_type_only {
+        // Type-only cast: propagate source metadata, stripping extension type 
keys
+        let mut meta = source_field.metadata().clone();
+        meta.remove(EXTENSION_TYPE_NAME_KEY);
+        meta.remove(EXTENSION_TYPE_METADATA_KEY);
+        meta
+    } else {
+        // Explicit target field: use target metadata exactly
+        target_field.metadata().clone()
+    };
+
     let mut f = source_field
         .as_ref()
         .clone()
-        .with_data_type(target_type.clone())
-        .with_metadata(source_field.metadata().clone());
+        .with_data_type(target_field.data_type().clone())
+        .with_metadata(metadata);
     if force_nullable {
         f = f.with_nullable(true);
     }

Review Comment:
   Thanks for pulling this together, @paleolimbot! For this part, from my 
limited understanding, it forces force_nullable to true here because a parsing 
failure at runtime produces a NULL rather than a query panic. Even though it 
forces this safety guard at the logical layer, do we risk any edge cases where 
intermediate query optimizer rules collapse or rearrange projections before 
reaching the physical tier, accidentally reading the user's non-nullable target 
metadata too early and ultimately causing a runtime panic or query panic?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to