Hi,

I would like to propose a solution to this JIRA issue. I looked at the
comments and there was some guidance around where in the code we should
update to allow for this behaviour. But I believe there are still two
questions that remain open:

   1. Is this expected behaviour (i.e. users should not be able to create
   an empty array)?
   2. If this is indeed expected behaviour, what should the data type be of
   the empty array?

I did some digging into other query engines / databases in hopes of
answering the following two questions - That can be found at the end of
this thread.

*Q: *Is this expected behaviour (i.e. users should not be able to create an
empty array)?
*A: *Yes I would say this is expected behaviour and something we should add
into the Flink SQL API.

*Q: *What should the data type be of the empty array?
*A: *This question is a bit harder to answer and I think it would require
two steps.

*Step 1: Pick a default data type to initialize the empty array.*

We can use an "empty data type" such as NULL, VOID.

*Step 2: Create or reuse type coercion to make using empty arrays easier.*

The above should unblock users from creating empty arrays, but if one would
use an empty array in an COALESCE operation.

i.e. SELECT COALESCE(int_column, ARRAY[])

I believe they will get a query issue where the type for int_column (INTEGER)
and the empty array (NULL, VOID) do not match. Thus a user will need to
cast the empty array:

i.e. SELECT COALESCE(int_column, CAST(ARRAY[] AS INT))

as such to have the COALESCE query to execute successfully.

-----
*Trino*
> EXPLAIN SELECT ARRAY[]
Fragment 0 [SINGLE]
    Output layout: [expr]
    Output partitioning: SINGLE []
    Output[columnNames = [_col0]]
    │   Layout: [expr:array(unknown)]
    │   Estimates: {rows: 1 (55B), cpu: 0, memory: 0B, network: 0B}
    │   _col0 := expr
    └─ Values[]
           Layout: [expr:array(unknown)]
           Estimates: {rows: 1 (55B), cpu: 0, memory: 0B, network: 0B}

 ("$literal$"(from_base64('AwAAAFJMRQAAAAAKAAAAQllURV9BUlJBWQEAAAABgAAAAAA=')))

Expected behaviour? *Yes.*
Array data type? *Unknown.*

*Spark*
> sc.sql.sql("SELECT ARRAY[]").explain()
 DataFrame[array(): array<void>]

Expected behaviour? *Yes.*
Array data type? *Void.*

*BigQuery*
> SELECT ARRAY[]
Field name Type          Mode
f0_              INTEGER  REPEATED

Expected behaviour? *Yes.*
Array data type? *Integer.*

Best,

Eric

Reply via email to