Kenyore created FLINK-26348:
-------------------------------

             Summary: Maybe ChangelogNormalize should ignore unused columns 
when deduplicate
                 Key: FLINK-26348
                 URL: https://issues.apache.org/jira/browse/FLINK-26348
             Project: Flink
          Issue Type: Improvement
    Affects Versions: 1.13.2
            Reporter: Kenyore


In my case I have tables below
 * sku(size:1K+)
 * custom_product(size:10B+)
 * order(size:100M+)

And my sql is like
{code:sql}
SELECT o.code,o.created,s.sku_name,p.product_name FROM order o 
    INNER JOIN custom_product p ON o.p_id=p.id
    INNER JOIN sku s ON s.id=p.s_id
{code}

Table sku has some other columns.
The problem is that when another column(be like description) in any row of 
table sku changes,flink may produce millions of update rows whitch is useless 
in downstream.Because we only pick column sku_name in the downstream,but the 
change is column description.

This kind of useless update row would bring pressure to downstream operators.

I think it is significant for flink to improve this.thks



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to