Jefffrey commented on code in PR #17536:
URL: https://github.com/apache/datafusion/pull/17536#discussion_r2343365209


##########
datafusion/functions-aggregate/src/average.rs:
##########
@@ -62,6 +62,17 @@ make_udaf_expr_and_func!(
     avg_udaf
 );
 
+pub fn avg_distinct(expr: Expr) -> Expr {
+    Expr::AggregateFunction(datafusion_expr::expr::AggregateFunction::new_udf(
+        avg_udaf(),
+        vec![expr],
+        true,
+        None,
+        vec![],
+        None,
+    ))
+}

Review Comment:
   Same as how count handles it:
   
   
https://github.com/apache/datafusion/blob/bfc5067718a3ddcb87531b5a9633605792078546/datafusion/functions-aggregate/src/count.rs#L71-L80



##########
datafusion/core/tests/dataframe/mod.rs:
##########
@@ -496,32 +497,35 @@ async fn drop_with_periods() -> Result<()> {
 #[tokio::test]
 async fn aggregate() -> Result<()> {
     // build plan using DataFrame API
-    let df = test_table().await?;
+    // union so some of the distincts have a clearly distinct result
+    let df = test_table().await?.union(test_table().await?)?;
     let group_expr = vec![col("c1")];
     let aggr_expr = vec![
-        min(col("c12")),
-        max(col("c12")),
-        avg(col("c12")),
-        sum(col("c12")),
-        count(col("c12")),
-        count_distinct(col("c12")),
+        min(col("c4")).alias("min(c4)"),
+        max(col("c4")).alias("max(c4)"),
+        avg(col("c4")).alias("avg(c4)"),
+        avg_distinct(col("c4")).alias("avg_distinct(c4)"),
+        sum(col("c4")).alias("sum(c4)"),
+        sum_distinct(col("c4")).alias("sum_distinct(c4)"),
+        count(col("c4")).alias("count(c4)"),
+        count_distinct(col("c4")).alias("count_distinct(c4)"),

Review Comment:
   I switched to `c4` from `c12` as `c12` had some precision variations for 
avg_distinct leading to inconsistent test results, and figured it was easier to 
switch columns than slap `round` on the outputs



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to