alamb commented on code in PR #11845: URL: https://github.com/apache/datafusion/pull/11845#discussion_r1709360078
########## datafusion/expr-common/src/lib.rs: ########## @@ -0,0 +1,32 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +//! Logical Expr Common packages for [DataFusion] +//! +//! [DataFusion]: <https://crates.io/crates/datafusion> Review Comment: I think some additional rationale here might help ```suggestion //! Logical Expr types and traits for [DataFusion] //! //! This crate contains types and traits that are used by both Logical and Physical expressions. //! They are kept in their own crate to avoid physical expressions depending on logical expressions. //! //! //! [DataFusion]: <https://crates.io/crates/datafusion> ``` ########## datafusion/functions-aggregate-common/src/lib.rs: ########## @@ -0,0 +1,32 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +//! Aggregate Function Common packages for [DataFusion] +//! This package could be used to build for 3rd party aggregate function +//! +//! [DataFusion]: <https://crates.io/crates/datafusion> Review Comment: ```suggestion //! Common Aggregate functionality for [DataFusion] //! //! This crate contains traits and utilities commonly used to implement aggregate functions //! They are kept in their own crate to avoid physical expressions depending on logical expressions. //! //! [DataFusion]: <https://crates.io/crates/datafusion> ``` ########## datafusion/functions-aggregate-common/Cargo.toml: ########## @@ -0,0 +1,46 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +[package] +name = "datafusion-functions-aggregate-common" +description = "Common aggregate function packages for the DataFusion query engine" Review Comment: ```suggestion description = "Utility functions for implementing aggregate functions for the DataFusion query engine" ``` ########## datafusion/expr-common/src/operator.rs: ########## @@ -287,202 +280,3 @@ impl fmt::Display for Operator { write!(f, "{display}") } } - Review Comment: ❤️ ########## datafusion/expr-common/Cargo.toml: ########## @@ -0,0 +1,43 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +[package] +name = "datafusion-expr-common" +description = "Logical plan and expression representation for DataFusion query engine" Review Comment: Maybe we could adjust this to say it had commonly shared types ```suggestion description = "Traits and types for logical plans and expressions for DataFusion query engine" ``` ########## datafusion/physical-expr/src/expressions/column.rs: ########## @@ -136,6 +138,35 @@ pub fn col(name: &str, schema: &Schema) -> Result<Arc<dyn PhysicalExpr>> { Ok(Arc::new(Column::new_with_schema(name, schema)?)) } +// TODO: Move expressions out of physical-expr? Review Comment: to where would it go? ########## datafusion/expr-common/src/groups_accumulator.rs: ########## @@ -17,7 +17,7 @@ //! Vectorized [`GroupsAccumulator`] -use arrow_array::{ArrayRef, BooleanArray}; +use arrow::array::{ArrayRef, BooleanArray}; Review Comment: ❤️ ########## datafusion/functions-aggregate/Cargo.toml: ########## @@ -44,6 +44,8 @@ arrow-schema = { workspace = true } datafusion-common = { workspace = true } datafusion-execution = { workspace = true } datafusion-expr = { workspace = true } +datafusion-functions-aggregate-common = { workspace = true } +datafusion-physical-expr = { workspace = true } Review Comment: What do you mean by "thin"? I checked on this branch and here is what datafusion/physical-expr still has (we could potentially move the groups accumulators into `datafusion-functions-aggegate-common` as well, as a follow on PR) ``` $ find datafusion/physical-expr/src datafusion/physical-expr/src datafusion/physical-expr/src/physical_expr.rs datafusion/physical-expr/src/partitioning.rs datafusion/physical-expr/src/analysis.rs datafusion/physical-expr/src/aggregate datafusion/physical-expr/src/aggregate/mod.rs datafusion/physical-expr/src/aggregate/stats.rs datafusion/physical-expr/src/aggregate/groups_accumulator datafusion/physical-expr/src/aggregate/groups_accumulator/adapter.rs datafusion/physical-expr/src/aggregate/groups_accumulator/mod.rs datafusion/physical-expr/src/lib.rs datafusion/physical-expr/src/functions.rs datafusion/physical-expr/src/utils datafusion/physical-expr/src/utils/guarantee.rs datafusion/physical-expr/src/utils/mod.rs datafusion/physical-expr/src/planner.rs datafusion/physical-expr/src/intervals datafusion/physical-expr/src/intervals/test_utils.rs datafusion/physical-expr/src/intervals/mod.rs datafusion/physical-expr/src/intervals/cp_solver.rs datafusion/physical-expr/src/intervals/utils.rs datafusion/physical-expr/src/equivalence datafusion/physical-expr/src/equivalence/properties.rs datafusion/physical-expr/src/equivalence/mod.rs datafusion/physical-expr/src/equivalence/projection.rs datafusion/physical-expr/src/equivalence/class.rs datafusion/physical-expr/src/equivalence/ordering.rs datafusion/physical-expr/src/window datafusion/physical-expr/src/window/row_number.rs datafusion/physical-expr/src/window/ntile.rs datafusion/physical-expr/src/window/window_expr.rs datafusion/physical-expr/src/window/lead_lag.rs datafusion/physical-expr/src/window/cume_dist.rs datafusion/physical-expr/src/window/nth_value.rs datafusion/physical-expr/src/window/built_in.rs datafusion/physical-expr/src/window/built_in_window_function_expr.rs datafusion/physical-expr/src/window/aggregate.rs datafusion/physical-expr/src/window/rank.rs datafusion/physical-expr/src/window/mod.rs datafusion/physical-expr/src/window/sliding_aggregate.rs datafusion/physical-expr/src/expressions datafusion/physical-expr/src/expressions/like.rs datafusion/physical-expr/src/expressions/negative.rs datafusion/physical-expr/src/expressions/literal.rs datafusion/physical-expr/src/expressions/cast.rs datafusion/physical-expr/src/expressions/binary.rs datafusion/physical-expr/src/expressions/is_not_null.rs datafusion/physical-expr/src/expressions/unknown_column.rs datafusion/physical-expr/src/expressions/mod.rs datafusion/physical-expr/src/expressions/is_null.rs datafusion/physical-expr/src/expressions/try_cast.rs datafusion/physical-expr/src/expressions/column.rs datafusion/physical-expr/src/expressions/no_op.rs datafusion/physical-expr/src/expressions/binary datafusion/physical-expr/src/expressions/binary/kernels.rs datafusion/physical-expr/src/expressions/case.rs datafusion/physical-expr/src/expressions/in_list.rs datafusion/physical-expr/src/expressions/not.rs datafusion/physical-expr/src/scalar_function.rs datafusion/physical-expr/src/math_expressions.rs ``` ########## datafusion/physical-plan/src/aggregates/mod.rs: ########## @@ -1980,65 +1973,36 @@ mod tests { // FIRST_VALUE(b ORDER BY b <SortOptions>) fn test_first_value_agg_expr( schema: &Schema, - dfschema: &DFSchema, sort_options: SortOptions, ) -> Result<Arc<dyn AggregateExpr>> { - let sort_exprs = vec![datafusion_expr::Expr::Sort(Sort { - expr: Box::new(datafusion_expr::col("b")), - asc: !sort_options.descending, - nulls_first: sort_options.nulls_first, - })]; - let ordering_req = vec![PhysicalSortExpr { + let ordering_req = [PhysicalSortExpr { expr: col("b", schema)?, options: sort_options, }]; - let args = vec![col("b", schema)?]; - let logical_args = vec![datafusion_expr::col("b")]; - let func = datafusion_expr::AggregateUDF::new_from_impl(FirstValue::new()); - datafusion_physical_expr_common::aggregate::create_aggregate_expr_with_dfschema( - &func, - &args, - &logical_args, - &sort_exprs, - &ordering_req, - dfschema, - None, - false, - false, - false, - ) + let args = [col("b", schema)?]; + + AggregateExprBuilder::new(first_value_udaf(), args.to_vec()) Review Comment: 😍 ########## datafusion/functions-aggregate-common/src/aggregate.rs: ########## @@ -0,0 +1,182 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +//! Contains the trait `AggregateExpr` which defines the interface all aggregate expressions +//! (built-in and custom) need to satisfy. Review Comment: As this first sentence is rendered as the summary on the docs page, I think it is nice to make a link to the relevant struct/trait if possible for example https://docs.rs/datafusion/latest/datafusion/index.html  ```suggestion //! [`AggregateExpr`] which defines the interface all aggregate expressions //! (built-in and custom) need to satisfy. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
