[ https://issues.apache.org/jira/browse/FLINK-17228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xingxing Di updated FLINK-17228: -------------------------------- Description: We are facing an special scenario, *we want to know if this feature is supported*: First count distinct deviceid for A,B dimensions, then sum up for just A dimension. Here is SQL: {code:java} SELECT dt, SUM(a.uv) AS uv FROM ( SELECT dt, pvareaid, COUNT(DISTINCT cuid) AS uv FROM streaming_log_event WHERE action IN ('action1') AND pvareaid NOT IN ('pv1', 'pv2') AND pvareaid IS NOT NULL GROUP BY dt, pvareaid ) a GROUP BY dt;{code} The question is the data emitted to sink was wrong, sink periodically got smaller result ({color:#ff0000}86{color}) which was wrong, here is the log: {code:java} 2020-04-17 22:28:38,727 INFO groupBy xx -> to: Tuple2 -> Sink: Unnamed (1/1) (GeneralRedisSinkFunction.invoke:169) - receive data(false,0,86,20200417) 2020-04-17 22:28:38,727 INFO groupBy xx -> to: Tuple2 -> Sink: Unnamed (1/1) (GeneralRedisSinkFunction.invoke:169) - receive data(true,0,130,20200417) 2020-04-17 22:28:39,327 INFO groupBy xx -> to: Tuple2 -> Sink: Unnamed (1/1) (GeneralRedisSinkFunction.invoke:169) - receive data(false,0,130,20200417) 2020-04-17 22:28:39,327 INFO groupBy xx -> to: Tuple2 -> Sink: Unnamed (1/1) (GeneralRedisSinkFunction.invoke:169) - receive data(true,0,86,20200417) 2020-04-17 22:28:39,327 INFO groupBy xx -> to: Tuple2 -> Sink: Unnamed (1/1) (GeneralRedisSinkFunction.invoke:169) - receive data(false,0,86,20200417) 2020-04-17 22:28:39,328 INFO groupBy xx -> to: Tuple2 -> Sink: Unnamed (1/1) (GeneralRedisSinkFunction.invoke:169) - receive data(true,0,131,20200417) {code} was: We are facing an special scenario, *we want to know if this feature is supported*: First count distinct deviceid for A,B dimensions, then sum up for just A dimension. Here is SQL: {code:java} SELECT dt, SUM(a.uv) AS uv FROM ( SELECT dt, pvareaid, COUNT(DISTINCT cuid) AS uv FROM streaming_log_event WHERE action IN ('action1') AND pvareaid NOT IN ('pv1', 'pv2') AND pvareaid IS NOT NULL GROUP BY dt, pvareaid ) a GROUP BY dt;{code} The question is the data emitted to sink was wrong, sink periodically got smaller result ({color:#FF0000}86{color}) which was wrong, here is the log: {code:java} 2020-04-17 22:28:38,727 INFO groupBy xx -> to: Tuple2 -> Sink: Unnamed (1/1) (GeneralRedisSinkFunction.invoke:169) - receive data(false,0,86,20200417) 2020-04-17 22:28:38,727 INFO groupBy xx -> to: Tuple2 -> Sink: Unnamed (1/1) (GeneralRedisSinkFunction.invoke:169) - receive data(true,0,130,20200417) 2020-04-17 22:28:39,327 INFO groupBy xx -> to: Tuple2 -> Sink: Unnamed (1/1) (GeneralRedisSinkFunction.invoke:169) - receive data(false,0,130,20200417) 2020-04-17 22:28:39,327 INFO groupBy xx -> to: Tuple2 -> Sink: Unnamed (1/1) (GeneralRedisSinkFunction.invoke:169) - receive data(true,0,86,20200417) 2020-04-17 22:28:39,327 INFO groupBy xx -> to: Tuple2 -> Sink: Unnamed (1/1) (GeneralRedisSinkFunction.invoke:169) - receive data(false,0,86,20200417) 2020-04-17 22:28:39,328 INFO groupBy xx -> to: Tuple2 -> Sink: Unnamed (1/1) (GeneralRedisSinkFunction.invoke:169) - receive data(true,0,131,20200417) {code} > Streaming sql with nested GROUP BY got wrong results > ---------------------------------------------------- > > Key: FLINK-17228 > URL: https://issues.apache.org/jira/browse/FLINK-17228 > Project: Flink > Issue Type: Bug > Components: Table SQL / API, Table SQL / Runtime > Affects Versions: 1.7.2 > Environment: Flink 1.7.2 > Parallelism is 1 > Reporter: Xingxing Di > Priority: Blocker > > We are facing an special scenario, *we want to know if this feature is > supported*: > First count distinct deviceid for A,B dimensions, then sum up for just A > dimension. > Here is SQL: > {code:java} > SELECT dt, SUM(a.uv) AS uv > FROM ( > SELECT dt, pvareaid, COUNT(DISTINCT cuid) AS uv > FROM streaming_log_event > WHERE action IN ('action1') > AND pvareaid NOT IN ('pv1', 'pv2') > AND pvareaid IS NOT NULL > GROUP BY dt, pvareaid > ) a > GROUP BY dt;{code} > The question is the data emitted to sink was wrong, sink periodically got > smaller result ({color:#ff0000}86{color}) which was wrong, here is the log: > {code:java} > 2020-04-17 22:28:38,727 INFO groupBy xx -> to: Tuple2 -> Sink: Unnamed > (1/1) (GeneralRedisSinkFunction.invoke:169) - receive > data(false,0,86,20200417) > 2020-04-17 22:28:38,727 INFO groupBy xx -> to: Tuple2 -> Sink: Unnamed > (1/1) (GeneralRedisSinkFunction.invoke:169) - receive > data(true,0,130,20200417) > 2020-04-17 22:28:39,327 INFO groupBy xx -> to: Tuple2 -> Sink: Unnamed > (1/1) (GeneralRedisSinkFunction.invoke:169) - receive > data(false,0,130,20200417) > 2020-04-17 22:28:39,327 INFO groupBy xx -> to: Tuple2 -> Sink: Unnamed > (1/1) (GeneralRedisSinkFunction.invoke:169) - receive data(true,0,86,20200417) > 2020-04-17 22:28:39,327 INFO groupBy xx -> to: Tuple2 -> Sink: Unnamed > (1/1) (GeneralRedisSinkFunction.invoke:169) - receive > data(false,0,86,20200417) > 2020-04-17 22:28:39,328 INFO groupBy xx -> to: Tuple2 -> Sink: Unnamed > (1/1) (GeneralRedisSinkFunction.invoke:169) - receive > data(true,0,131,20200417) > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)