adriangb commented on code in PR #16589:
URL: https://github.com/apache/datafusion/pull/16589#discussion_r2176101815


##########
datafusion/physical-expr-adapter/src/schema_rewriter.rs:
##########
@@ -0,0 +1,811 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Physical expression schema rewriting utilities with struct support
+
+use std::sync::Arc;
+
+use arrow::compute::can_cast_types;
+use arrow::datatypes::{DataType, FieldRef, Schema};
+use datafusion_common::{
+    exec_err,
+    nested_struct::validate_struct_compatibility,
+    tree_node::{Transformed, TransformedResult, TreeNode},
+    Result, ScalarValue,
+};
+use datafusion_expr::ScalarUDF;
+use datafusion_functions::core::{getfield::GetFieldFunc, r#struct::StructFunc};
+use datafusion_physical_expr::{
+    expressions::{self, CastExpr, Column},
+    ScalarFunctionExpr,
+};
+use datafusion_physical_expr_common::physical_expr::PhysicalExpr;
+
+/// Build a struct expression by recursively extracting and rewriting fields 
from a source struct.

Review Comment:
   > Something about relying on struct and field extraction feels very wrong to 
me. Among other things now this casting may not be consistent with what happens 
when someone tries to call CAST(..) manually
   
   Does it have to match? These seem like two similar but not exactly identical 
use cases to me.
   
   > How about this for an alternate idea:
   
   That seems reasonable to me.
   
   > Avoid the need to call the struct and get_field functions
   
   I am probably missing something but thinking about it to me it didn't seem 
like those calls would be much of a problem. Either way a new struct column is 
going to have to be built and the existing struct column is going to have to be 
read from disk in its entirety.
   
   But the pushback made me think that maybe we should be doing something else 
here: we should be inspecting the expression more deeply and in the case of 
`col.field` if `field` is not present in the physical schema we replace the 
whole `get_field(col, 'field')` expression with `null` instead of 
`get_col(struct('other', get_field(col, 'other'), 'field', null)))` which is 
basically what is happening now. Then we avoid reading the column altogether. 
We can do something similar with casts. I think this is complimentary to your 
proposal because the fallthrough case for `select struct_col` can call `cast` 
if needed like any other column, and our internal `cast` implementation can do 
the "smart things" with structs.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to