[ 
https://issues.apache.org/jira/browse/FLINK-21949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17764658#comment-17764658
 ] 

Jiabao Sun commented on FLINK-21949:
------------------------------------

The pull request is ready for review now.

This implementation made some simplifications based on Calcite's 
SqlLibraryOperators.ARRAY_AGG.
{code:java}
// calcite
ARRAY_AGG([ ALL | DISTINCT ] value [ RESPECT NULLS | IGNORE NULLS ] [ ORDER BY 
orderItem [, orderItem ]* ] )
// flink
ARRAY_AGG([ ALL | DISTINCT ] expression)
{code}

The differences from Calcite are as follows:
# Null values are ignored.
# The order by expression within the function is not supported because the 
complete row record cannot be accessed within the function implementation.
# The function returns null when there's no input rows, but calcite definition 
returns an empty array. The behavior was referenced from BigQuery and Postgres.

https://cloud.google.com/bigquery/docs/reference/standard-sql/aggregate_functions#array_agg
https://www.postgresql.org/docs/8.4/functions-aggregate.html

> Support ARRAY_AGG aggregate function
> ------------------------------------
>
>                 Key: FLINK-21949
>                 URL: https://issues.apache.org/jira/browse/FLINK-21949
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Table SQL / API
>    Affects Versions: 1.12.0
>            Reporter: Jiabao Sun
>            Assignee: Jiabao Sun
>            Priority: Minor
>              Labels: pull-request-available
>             Fix For: 1.19.0
>
>
> Some nosql databases like mongodb and elasticsearch support nested data types.
> Aggregating multiple rows into ARRAY<ROW> is a common requirement.
> The CollectToArray function is similar to Collect, except that it returns 
> ARRAY<ROW> instead of MULTISET<ROW>.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to