alamb opened a new issue, #15458: URL: https://github.com/apache/datafusion/issues/15458
### Is your feature request related to a problem or challenge? - @goldmedal points out on https://github.com/apache/datafusion/issues/15363 that AVG for duration isn't supported Specifically given this table ```sql > create or replace table t(user_id int, ts timestamp) as values (1, '2021-01-01'), (2, '2021-01-02'), (1, '2021-01-03'), (2, '2021-01-04'); 0 row(s) fetched. Elapsed 0.003 seconds. > select * from t; +---------+---------------------+ | user_id | ts | +---------+---------------------+ | 1 | 2021-01-01T00:00:00 | | 2 | 2021-01-02T00:00:00 | | 1 | 2021-01-03T00:00:00 | | 2 | 2021-01-04T00:00:00 | +---------+---------------------+ 4 row(s) fetched. Elapsed 0.006 seconds. ``` I would like to be able to compute the average distance for the timestamps. However it doesn't work today: ```sql > select avg('2021-01-01'::timestamp - ts) from t; Error during planning: Execution error: Function 'avg' user-defined coercion failed with "Error during planning: The function \"avg\" does not support inputs of type Duration(Nanosecond)." No function matches the given name and argument types 'avg(Duration(Nanosecond))'. You might need to add explicit type casts. Candidate functions: avg(UserDefined) ``` ```sql > select user_id, avg('2021-01-01'::timestamp - ts) from t GROUP BY user_id; Error during planning: Execution error: Function 'avg' user-defined coercion failed with "Error during planning: The function \"avg\" does not support inputs of type Duration(Nanosecond)." No function matches the given name and argument types 'avg(Duration(Nanosecond))'. You might need to add explicit type casts. Candidate functions: avg(UserDefined) ``` ### Describe the solution you'd like I would like the two queries above to work Here is how they work in postgres ```sql postgres=# create table t(user_id int, ts timestamp); CREATE TABLE postgres=# insert into t values (1, '2021-01-01'), (2, '2021-01-02'), (1, '2021-01-03'), (2, '2021-01-04'); INSERT 0 4 postgres=# select avg('2021-01-01'::timestamp - ts) from t; avg ------------------- -1 days -12:00:00 (1 row) postgres=# select user_id, avg('2021-01-01'::timestamp - ts) from t GROUP BY user_id; user_id | avg ---------+--------- 2 | -2 days 1 | -1 days (2 rows) ``` ### Describe alternatives you've considered I think we can follow the model we did for `Min` and `Max` - Add `Accumulator` like @svranesevic did in https://github.com/apache/datafusion/pull/15310 - Add `GroupsAccumulator` like @shruti2522 did in https://github.com/apache/datafusion/pull/15322 ### Additional context I think this is a good first issue as it should involve instantiating some new types and writing tests. I don't think we'll need to implement any new accumulators -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org