Kenyore created FLINK-26348: ------------------------------- Summary: Maybe ChangelogNormalize should ignore unused columns when deduplicate Key: FLINK-26348 URL: https://issues.apache.org/jira/browse/FLINK-26348 Project: Flink Issue Type: Improvement Affects Versions: 1.13.2 Reporter: Kenyore
In my case I have tables below * sku(size:1K+) * custom_product(size:10B+) * order(size:100M+) And my sql is like {code:sql} SELECT o.code,o.created,s.sku_name,p.product_name FROM order o INNER JOIN custom_product p ON o.p_id=p.id INNER JOIN sku s ON s.id=p.s_id {code} Table sku has some other columns. The problem is that when another column(be like description) in any row of table sku changes,flink may produce millions of update rows whitch is useless in downstream.Because we only pick column sku_name in the downstream,but the change is column description. This kind of useless update row would bring pressure to downstream operators. I think it is significant for flink to improve this.thks -- This message was sent by Atlassian Jira (v8.20.1#820001)