alamb commented on code in PR #16589:
URL: https://github.com/apache/datafusion/pull/16589#discussion_r2176012085


##########
datafusion/physical-expr-adapter/src/schema_rewriter.rs:
##########
@@ -0,0 +1,811 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Physical expression schema rewriting utilities with struct support
+
+use std::sync::Arc;
+
+use arrow::compute::can_cast_types;
+use arrow::datatypes::{DataType, FieldRef, Schema};
+use datafusion_common::{
+    exec_err,
+    nested_struct::validate_struct_compatibility,
+    tree_node::{Transformed, TransformedResult, TreeNode},
+    Result, ScalarValue,
+};
+use datafusion_expr::ScalarUDF;
+use datafusion_functions::core::{getfield::GetFieldFunc, r#struct::StructFunc};
+use datafusion_physical_expr::{
+    expressions::{self, CastExpr, Column},
+    ScalarFunctionExpr,
+};
+use datafusion_physical_expr_common::physical_expr::PhysicalExpr;
+
+/// Build a struct expression by recursively extracting and rewriting fields 
from a source struct.

Review Comment:
   Can we add an example of what this is doing? 
   
   It seems like if the table schema is like `{a: int, b:int}` and the file 
only has `{a: int}` this function will build up an expression like
   ```
   struct(a, source.a, b, null)
   ```
   
   Something about relying on struct and field extraction feels very wrong to 
me.  Among other things now this casting may not be consistent with what 
happens when someone tries to call `CAST(..)` manually
   
   Is it the case that `build_struct_expr` would be unecessary if `cast` had 
native support for casting structs to `struct`s and filling in nulls 🤔 
   
   How about this for an alternate idea:
   1. Add a `cast` wrapper in datafusion with the same signature as arrow-rs's 
cast
   2. If the arguments were both Struct types, apply the special DataFusion 
casting logic(fill in missing fields with nulls)
   3. otherwise just call into the existing arrow cast kernel
   
   That would 
   1. Avoid the need to call the struct and get_field functions
   2. Allow other parts of the system, like coercsion, and SQL, to have the 
same semantics
   
   We have talked about adding cast for evolving structs in arrow-rs for a 
while but it seems the consensus was to put it in DataFusion
   - https://github.com/apache/arrow-rs/issues/7176
   - https://github.com/apache/arrow-rs/issues/5151
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to