[I] Support `AVG` for durations [datafusion]

via GitHub Thu, 27 Mar 2025 07:22:41 -0700


alamb opened a new issue, #15458:
URL: https://github.com/apache/datafusion/issues/15458


   ### Is your feature request related to a problem or challenge?
   
   - @goldmedal  points out on 
https://github.com/apache/datafusion/issues/15363  that AVG for duration isn't 
supported
   
   Specifically given this table
   ```sql
   > create or replace table t(user_id int, ts timestamp) as values (1, 
'2021-01-01'), (2, '2021-01-02'), (1, '2021-01-03'), (2, '2021-01-04');
   0 row(s) fetched.
   Elapsed 0.003 seconds.
   
   > select * from t;
   +---------+---------------------+
   | user_id | ts                  |
   +---------+---------------------+
   | 1       | 2021-01-01T00:00:00 |
   | 2       | 2021-01-02T00:00:00 |
   | 1       | 2021-01-03T00:00:00 |
   | 2       | 2021-01-04T00:00:00 |
   +---------+---------------------+
   4 row(s) fetched.
   Elapsed 0.006 seconds.
   ```
   
   I would like to be able to compute the average distance for the timestamps. 
However it doesn't work today:
   
   ```sql
   > select avg('2021-01-01'::timestamp -  ts) from t;
   Error during planning: Execution error: Function 'avg' user-defined coercion 
failed with "Error during planning: The function \"avg\" does not support 
inputs of type Duration(Nanosecond)." No function matches the given name and 
argument types 'avg(Duration(Nanosecond))'. You might need to add explicit type 
casts.
        Candidate functions:
        avg(UserDefined)
   ```
   
   
   ```sql
   > select user_id, avg('2021-01-01'::timestamp -  ts) from t GROUP BY user_id;
   Error during planning: Execution error: Function 'avg' user-defined coercion 
failed with "Error during planning: The function \"avg\" does not support 
inputs of type Duration(Nanosecond)." No function matches the given name and 
argument types 'avg(Duration(Nanosecond))'. You might need to add explicit type 
casts.
        Candidate functions:
        avg(UserDefined)
   ```
   
   ### Describe the solution you'd like
   
   I would like the two queries above to work 
   
   Here is how they work in postgres
   
   ```sql
   postgres=# create table t(user_id int, ts timestamp);
   CREATE TABLE
   postgres=# insert into t  values (1, '2021-01-01'), (2, '2021-01-02'), (1, 
'2021-01-03'), (2, '2021-01-04');
   INSERT 0 4
   postgres=# select avg('2021-01-01'::timestamp -  ts) from t;
           avg
   -------------------
    -1 days -12:00:00
   (1 row)
   
   postgres=# select user_id, avg('2021-01-01'::timestamp -  ts) from t GROUP 
BY user_id;
    user_id |   avg
   ---------+---------
          2 | -2 days
          1 | -1 days
   (2 rows)
   ```
   
   ### Describe alternatives you've considered
   
   I think we can follow the model we did for `Min` and `Max`
   - Add `Accumulator` like @svranesevic did in  
https://github.com/apache/datafusion/pull/15310
   - Add `GroupsAccumulator` like @shruti2522  did in 
https://github.com/apache/datafusion/pull/15322
   
   
   
   ### Additional context
   
   I think this is a good first issue as it should involve instantiating some 
new types and writing tests. I don't think we'll need to implement any new 
accumulators


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

[I] Support `AVG` for durations [datafusion]

Reply via email to