Zhengchao Shi created FLINK-19452: ------------------------------------- Summary: statistics of group by CDC data is always 1 Key: FLINK-19452 URL: https://issues.apache.org/jira/browse/FLINK-19452 Project: Flink Issue Type: Bug Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile) Affects Versions: 1.11.1 Reporter: Zhengchao Shi Fix For: 1.12.0
When using CDC to do count statistics, if only updates are made to the source table(mysql table), then the value of count is always 1. {code:sql} CREATE TABLE orders ( order_number int, product_id int ) with ( 'connector' = 'kafka-0.11', 'topic' = 'Topic', 'properties.bootstrap.servers' = 'localhost:9092', 'properties.group.id' = 'GroupId', 'scan.startup.mode' = 'latest-offset', 'format' = 'canal-json' ); CREATE TABLE order_test ( order_number int, order_cnt bigint ) WITH ( 'connector' = 'print' ); INSERT INTO order_test SELECT order_number, count(1) FROM orders GROUP BY order_number; {code} 3 records in “orders” : ||order_number||product_id|| |10001|1| |10001|2| |10001|3| now update orders table: {code:sql} update orders set product_id = 5 where order_number = 10001; {code} the output of is : -D(10001,1) +I(10001,1) -D(10001,1) +I(10001,1) -D(10001,1) +I(10001,1) i think, the final result is +I(10001, 3) -- This message was sent by Atlassian Jira (v8.3.4#803005)