alamb commented on code in PR #16589: URL: https://github.com/apache/datafusion/pull/16589#discussion_r2176012085
########## datafusion/physical-expr-adapter/src/schema_rewriter.rs: ########## @@ -0,0 +1,811 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +//! Physical expression schema rewriting utilities with struct support + +use std::sync::Arc; + +use arrow::compute::can_cast_types; +use arrow::datatypes::{DataType, FieldRef, Schema}; +use datafusion_common::{ + exec_err, + nested_struct::validate_struct_compatibility, + tree_node::{Transformed, TransformedResult, TreeNode}, + Result, ScalarValue, +}; +use datafusion_expr::ScalarUDF; +use datafusion_functions::core::{getfield::GetFieldFunc, r#struct::StructFunc}; +use datafusion_physical_expr::{ + expressions::{self, CastExpr, Column}, + ScalarFunctionExpr, +}; +use datafusion_physical_expr_common::physical_expr::PhysicalExpr; + +/// Build a struct expression by recursively extracting and rewriting fields from a source struct. Review Comment: Can we add an example of what this is doing? It seems like if the table schema is like `{a: int, b:int}` and the file only has `{a: int}` this function will build up an expression like ``` struct(a, source.a, b, null) ``` Something about relying on struct and field extraction feels very wrong to me. Among other things now this casting may not be consistent with what happens when someone tries to call `CAST(..)` manually Is it the case that `build_struct_expr` would be unecessary if `cast` had native support for casting structs to `struct`s and filling in nulls 🤔 How about this for an alternate idea: 1. Add a `cast` wrapper in datafusion with the same signature as arrow-rs's cast 2. If the arguments were both Struct types, apply the special DataFusion casting logic(fill in missing fields with nulls) 3. otherwise just call into the existing arrow cast kernel That would 1. Avoid the need to call the struct and get_field functions 2. Allow other parts of the system, like coercsion, and SQL, to have the same semantics We have talked about adding cast for evolving structs in arrow-rs for a while but it seems the consensus was to put it in DataFusion - https://github.com/apache/arrow-rs/issues/7176 - https://github.com/apache/arrow-rs/issues/5151 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org