mbutrovich commented on code in PR #1226:
URL: https://github.com/apache/datafusion-comet/pull/1226#discussion_r1905609592


##########
native/spark-expr/src/cast.rs:
##########
@@ -817,17 +818,28 @@ fn cast_struct_to_struct(
     cast_options: &SparkCastOptions,
 ) -> DataFusionResult<ArrayRef> {
     match (from_type, to_type) {
-        (DataType::Struct(_), DataType::Struct(to_fields)) => {
-            let mut cast_fields: Vec<(Arc<Field>, ArrayRef)> = 
Vec::with_capacity(to_fields.len());
+        (DataType::Struct(from_fields), DataType::Struct(to_fields)) => {
+            // TODO some of this logic may be specific to converting Parquet 
to Spark
+            let mut field_name_to_index_map = HashMap::new();

Review Comment:
   I'm wondering if we'll end up adding a cache in the SchemaAdapter logic for 
these sorts of maps, since it looks like this would be generated for each batch?



##########
native/spark-expr/src/schema_adapter.rs:
##########
@@ -321,10 +322,26 @@ fn cast_supported(from_type: &DataType, to_type: 
&DataType, options: &SparkCastO
         (Timestamp(_, Some(_)), _) => can_cast_from_timestamp(to_type, 
options),
         (Utf8 | LargeUtf8, _) => can_cast_from_string(to_type, options),
         (_, Utf8 | LargeUtf8) => can_cast_to_string(from_type, options),
-        (Struct(from_fields), Struct(to_fields)) => from_fields
-            .iter()
-            .zip(to_fields.iter())
-            .all(|(a, b)| cast_supported(a.data_type(), b.data_type(), 
options)),
+        (Struct(from_fields), Struct(to_fields)) => {
+            // TODO some of this logic may be specific to converting Parquet 
to Spark
+            let mut field_types = HashMap::new();
+            for field in from_fields {
+                field_types.insert(field.name(), field.data_type());

Review Comment:
   Same caching comment as above, mostly as a reminder to our future selves.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to