[ 
https://issues.apache.org/jira/browse/FLINK-32296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17739252#comment-17739252
 ] 

Sergey Nuyanzin edited comment on FLINK-32296 at 7/1/23 12:14 AM:
------------------------------------------------------------------

The root cause is {{{}RowToRowCastRule{}}}. Since it was introduced in 1.15.0 
at FLINK-25052 it could work for 1.14.x.

During code gen it generates something like
{code:java}
...
for (int i$13 = 0; i$13 < array$7.size(); i$13++) {
...
   writer$17.reset();
...
   result$15 = row$16;
...
   objArray$12[i$13] = result$15;
...
}
...
{code}
where {{result$15}} - item of array and in case of row it is passed by 
reference, and then overridden by other values in next iterations.
Finally every element of array references to the latest source array element.
Thus if we look at example and especially at last element of every array from 
{{bug_data}} in description there are only two different elements. That 
explains why it gives currently 2 elements instead of 5.

Same problem is for maps with size more than 1 where key or value is row

so the idea of fix just use \{{copy}} method of \{{RowBinaryData}}


was (Author: sergey nuyanzin):
The root cause is {{RowToRowCastRule}}. Since it was introduced in 1.15.0 at 
FLINK-25052 it could work for 1.14.x.

During code gen it generates something like
{code:java}
...
for (int i$13 = 0; i$13 < array$7.size(); i$13++) {
...
   writer$17.reset();
...
   result$15 = row$16;
...
   objArray$12[i$13] = result$15;
...
}
...
{code}
where {{result$15}} - item of array and in case of row it is passed by 
reference, and then overridden by other values in next iterations.
Finally every element of array references to the latest source array element.
Thus if we look at example and especially at last element of every array from 
{{bug_data}} in description there are only two different elements. That 
explains why it gives currently 2 elements instead of 5.

Same problem is for maps with size more than 1 where key or value is row 

> Flink SQL handle array of row incorrectly
> -----------------------------------------
>
>                 Key: FLINK-32296
>                 URL: https://issues.apache.org/jira/browse/FLINK-32296
>             Project: Flink
>          Issue Type: Bug
>          Components: Table SQL / API
>    Affects Versions: 1.15.3, 1.16.2, 1.17.1
>            Reporter: Lim Qing Wei
>            Priority: Major
>
> FlinkSQL produce incorrect result when involving data with type of 
> ARRAY<ROW>, here's a reproduction:
>  
>  
> {code:java}
> CREATE TEMPORARY VIEW bug_data as (
> SELECT CAST(ARRAY[
> (10, '2020-01-10'), (101, '244ddf'), (1011, '2asdfaf'), (1110, '200'), (2210, 
> '20-01-10'), (4410, '21111')
> ] AS ARRAY<ROW<A INT, B STRING>>)
> UNION
> SELECT CAST(ARRAY[
> (10, '2020-01-10'), (121, '244ddf'), (2222, '2asdfaf'), (32243, '200'), 
> (2210, '33333-01-10'), (4410, '23243243')
> ] AS ARRAY<ROW<A INT, B STRING>>)
> UNION SELECT CAST(ARRAY[
> (10, '2020-01-10'), (222, '244ddf'), (1011, '2asdfaf'), (1110, '200'), 
> (24367, '20-01-10'), (4410, '21111')
> ] AS ARRAY<ROW<A INT, B STRING>>)
> UNION SELECT CAST(ARRAY[
> (10, '2020-01-10'), (5666, '244ddf'), (435243, '2asdfaf'), (56567, '200'), 
> (2210, '20-01-10'), (4410, '21111')
> ] AS ARRAY<ROW<A INT, B STRING>>)
> UNION SELECT CAST(ARRAY[
> (10, '2020-01-10'), (43543, '244ddf'), (1011, '2asdfaf'), (1110, '200'), 
> (8967564, '20-01-10'), (4410, '21111')
> ] AS ARRAY<ROW<A INT, B STRING>>)
> );
> CREATE TABLE sink (
> r ARRAY<ROW<A INT, B STRING>>
> ) WITH ('connector' = 'print'); {code}
>  
>  
> In all 1.15. 1.16 and 1.17 version I've tested, it produces the following:
>  
> {noformat}
> [+I[4410, 21111], +I[4410, 21111], +I[4410, 21111], +I[4410, 21111], +I[4410, 
> 21111], +I[4410, 21111]]
> [+I[4410, 23243243], +I[4410, 23243243], +I[4410, 23243243], +I[4410, 
> 23243243], +I[4410, 23243243], +I[4410, 23243243]]{noformat}
>  
>  
> I think this is unexpected/wrong because:
>  # The query should produce 5 rows, not 2
>  # The data is also wrong, noticed it just make every row in the array the 
> same, but the input are not the same.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to