[ 
https://issues.apache.org/jira/browse/FLINK-17228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xingxing Di updated FLINK-17228:
--------------------------------
    Description: 
We are facing an special scenario, *we want to know if this feature is 
supported*:
 First count distinct deviceid for A,B dimensions, then sum up for just A 
dimension.
 Here is SQL:
{code:java}
SELECT dt, SUM(a.uv) AS uv
FROM (
 SELECT dt, pvareaid, COUNT(DISTINCT cuid) AS uv
 FROM streaming_log_event
 WHERE action IN ('action1')
 AND pvareaid NOT IN ('pv1', 'pv2')
 AND pvareaid IS NOT NULL
 GROUP BY dt, pvareaid
) a
GROUP BY dt;{code}
The question is the data emitted to sink was wrong, sink periodically got 
smaller result ({color:#ff0000}86{color}) which was wrong,  here is the log:
{code:java}
2020-04-17 22:28:38,727    INFO groupBy xx -> to: Tuple2 -> Sink: Unnamed (1/1) 
(GeneralRedisSinkFunction.invoke:169) - receive data(false,0,86,20200417)
2020-04-17 22:28:38,727    INFO groupBy xx -> to: Tuple2 -> Sink: Unnamed (1/1) 
(GeneralRedisSinkFunction.invoke:169) - receive data(true,0,130,20200417)
2020-04-17 22:28:39,327    INFO groupBy xx -> to: Tuple2 -> Sink: Unnamed (1/1) 
(GeneralRedisSinkFunction.invoke:169) - receive data(false,0,130,20200417)
2020-04-17 22:28:39,327    INFO groupBy xx -> to: Tuple2 -> Sink: Unnamed (1/1) 
(GeneralRedisSinkFunction.invoke:169) - receive data(true,0,86,20200417)
2020-04-17 22:28:39,327    INFO groupBy xx -> to: Tuple2 -> Sink: Unnamed (1/1) 
(GeneralRedisSinkFunction.invoke:169) - receive data(false,0,86,20200417)
2020-04-17 22:28:39,328    INFO groupBy xx -> to: Tuple2 -> Sink: Unnamed (1/1) 
(GeneralRedisSinkFunction.invoke:169) - receive data(true,0,131,20200417)
{code}
 

  was:
We are facing an special scenario, *we want to know if this feature is 
supported*:
First count distinct deviceid for A,B dimensions, then sum up for just A 
dimension.
Here is SQL:
{code:java}
SELECT dt, SUM(a.uv) AS uv
FROM (
 SELECT dt, pvareaid, COUNT(DISTINCT cuid) AS uv
 FROM streaming_log_event
 WHERE action IN ('action1')
 AND pvareaid NOT IN ('pv1', 'pv2')
 AND pvareaid IS NOT NULL
 GROUP BY dt, pvareaid
) a
GROUP BY dt;{code}
The question is the data emitted to sink was wrong, sink periodically got 
smaller result ({color:#FF0000}86{color}) which was wrong,  here is the log:
{code:java}

2020-04-17 22:28:38,727    INFO groupBy xx -> to: Tuple2 -> Sink: Unnamed (1/1) 
(GeneralRedisSinkFunction.invoke:169) - receive data(false,0,86,20200417)
2020-04-17 22:28:38,727    INFO groupBy xx -> to: Tuple2 -> Sink: Unnamed (1/1) 
(GeneralRedisSinkFunction.invoke:169) - receive data(true,0,130,20200417)
2020-04-17 22:28:39,327    INFO groupBy xx -> to: Tuple2 -> Sink: Unnamed (1/1) 
(GeneralRedisSinkFunction.invoke:169) - receive data(false,0,130,20200417)
2020-04-17 22:28:39,327    INFO groupBy xx -> to: Tuple2 -> Sink: Unnamed (1/1) 
(GeneralRedisSinkFunction.invoke:169) - receive data(true,0,86,20200417)
2020-04-17 22:28:39,327    INFO groupBy xx -> to: Tuple2 -> Sink: Unnamed (1/1) 
(GeneralRedisSinkFunction.invoke:169) - receive data(false,0,86,20200417)
2020-04-17 22:28:39,328    INFO groupBy xx -> to: Tuple2 -> Sink: Unnamed (1/1) 
(GeneralRedisSinkFunction.invoke:169) - receive data(true,0,131,20200417)
{code}
 


> Streaming sql with nested GROUP BY got wrong results
> ----------------------------------------------------
>
>                 Key: FLINK-17228
>                 URL: https://issues.apache.org/jira/browse/FLINK-17228
>             Project: Flink
>          Issue Type: Bug
>          Components: Table SQL / API, Table SQL / Runtime
>    Affects Versions: 1.7.2
>         Environment: Flink 1.7.2
> Parallelism is 1
>            Reporter: Xingxing Di
>            Priority: Blocker
>
> We are facing an special scenario, *we want to know if this feature is 
> supported*:
>  First count distinct deviceid for A,B dimensions, then sum up for just A 
> dimension.
>  Here is SQL:
> {code:java}
> SELECT dt, SUM(a.uv) AS uv
> FROM (
>  SELECT dt, pvareaid, COUNT(DISTINCT cuid) AS uv
>  FROM streaming_log_event
>  WHERE action IN ('action1')
>  AND pvareaid NOT IN ('pv1', 'pv2')
>  AND pvareaid IS NOT NULL
>  GROUP BY dt, pvareaid
> ) a
> GROUP BY dt;{code}
> The question is the data emitted to sink was wrong, sink periodically got 
> smaller result ({color:#ff0000}86{color}) which was wrong,  here is the log:
> {code:java}
> 2020-04-17 22:28:38,727    INFO groupBy xx -> to: Tuple2 -> Sink: Unnamed 
> (1/1) (GeneralRedisSinkFunction.invoke:169) - receive 
> data(false,0,86,20200417)
> 2020-04-17 22:28:38,727    INFO groupBy xx -> to: Tuple2 -> Sink: Unnamed 
> (1/1) (GeneralRedisSinkFunction.invoke:169) - receive 
> data(true,0,130,20200417)
> 2020-04-17 22:28:39,327    INFO groupBy xx -> to: Tuple2 -> Sink: Unnamed 
> (1/1) (GeneralRedisSinkFunction.invoke:169) - receive 
> data(false,0,130,20200417)
> 2020-04-17 22:28:39,327    INFO groupBy xx -> to: Tuple2 -> Sink: Unnamed 
> (1/1) (GeneralRedisSinkFunction.invoke:169) - receive data(true,0,86,20200417)
> 2020-04-17 22:28:39,327    INFO groupBy xx -> to: Tuple2 -> Sink: Unnamed 
> (1/1) (GeneralRedisSinkFunction.invoke:169) - receive 
> data(false,0,86,20200417)
> 2020-04-17 22:28:39,328    INFO groupBy xx -> to: Tuple2 -> Sink: Unnamed 
> (1/1) (GeneralRedisSinkFunction.invoke:169) - receive 
> data(true,0,131,20200417)
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to