[ https://issues.apache.org/jira/browse/FLINK-20370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17455065#comment-17455065 ]
Jingsong Lee edited comment on FLINK-20370 at 12/10/21, 2:11 AM: ----------------------------------------------------------------- Fixed via: part1: Fix wrong results when sink primary key is not the same with query result's changelog upsert key master: f8f6935adc841529ecdc0636174650cffbf73719 release-1.14: 7c5ddbd201005e55ab68b4db7ee74c7cbeb13400 release-1.13: 54aaffc7d9b2b06726294cf636d1b9acf74c5d49 part2: introduce 'table.exec.sink.keyed-shuffle' option to auto keyby on sink's pk if parallelism are not the same for insertOnly input master: 9e76585f1fa110288604913a73d86ac2f1542777 part2 does not cherry-pick to 1.14 because it may affect the normal plan and lead to incompatibility. was (Author: lzljs3620320): Fixed via: part1: Fix wrong results when sink primary key is not the same with query result's changelog upsert key master: f8f6935adc841529ecdc0636174650cffbf73719 release-1.14: 7c5ddbd201005e55ab68b4db7ee74c7cbeb13400 part2: introduce 'table.exec.sink.keyed-shuffle' option to auto keyby on sink's pk if parallelism are not the same for insertOnly input master: 9e76585f1fa110288604913a73d86ac2f1542777 part2 does not cherry-pick to 1.14 because it may affect the normal plan and lead to incompatibility. > Result is wrong when sink primary key is not the same with query > ---------------------------------------------------------------- > > Key: FLINK-20370 > URL: https://issues.apache.org/jira/browse/FLINK-20370 > Project: Flink > Issue Type: Bug > Components: Table SQL / Planner > Affects Versions: 1.12.0 > Reporter: Jark Wu > Assignee: lincoln lee > Priority: Critical > Labels: pull-request-available > Fix For: 1.15.0, 1.14.1, 1.13.4 > > > Both sources are upsert-kafka which synchronizes the changes from MySQL > tables (source_city, source_customer). The sink is another MySQL table which > is in upsert mode with "city_name" primary key. The join key is "city_id". > In this case, the result will be wrong when updating > {{source_city.city_name}} column in MySQL, as the UPDATE_BEFORE is ignored > and the old city_name is retained in the sink table. > {code} > Sink(table=[default_catalog.default_database.sink_kafka_count_city], > fields=[city_name, count_customer, sum_gender], changelogMode=[NONE]) > +- Calc(select=[city_name, CAST(count_customer) AS count_customer, > CAST(sum_gender) AS sum_gender], changelogMode=[I,UA,D]) > +- Join(joinType=[InnerJoin], where=[=(city_id, id)], select=[city_id, > count_customer, sum_gender, id, city_name], > leftInputSpec=[JoinKeyContainsUniqueKey], > rightInputSpec=[JoinKeyContainsUniqueKey], changelogMode=[I,UA,D]) > :- Exchange(distribution=[hash[city_id]], changelogMode=[I,UA,D]) > : +- GlobalGroupAggregate(groupBy=[city_id], select=[city_id, > COUNT_RETRACT(count1$0) AS count_customer, SUM_RETRACT((sum$1, count$2)) AS > sum_gender], changelogMode=[I,UA,D]) > : +- Exchange(distribution=[hash[city_id]], changelogMode=[I]) > : +- LocalGroupAggregate(groupBy=[city_id], select=[city_id, > COUNT_RETRACT(*) AS count1$0, SUM_RETRACT(gender) AS (sum$1, count$2)], > changelogMode=[I]) > : +- Calc(select=[city_id, gender], changelogMode=[I,UB,UA,D]) > : +- ChangelogNormalize(key=[customer_id], > changelogMode=[I,UB,UA,D]) > : +- Exchange(distribution=[hash[customer_id]], > changelogMode=[UA,D]) > : +- MiniBatchAssigner(interval=[3000ms], > mode=[ProcTime], changelogMode=[UA,D]) > : +- TableSourceScan(table=[[default_catalog, > default_database, source_customer]], fields=[customer_id, city_id, age, > gender, update_time], changelogMode=[UA,D]) > +- Exchange(distribution=[hash[id]], changelogMode=[I,UA,D]) > +- ChangelogNormalize(key=[id], changelogMode=[I,UA,D]) > +- Exchange(distribution=[hash[id]], changelogMode=[UA,D]) > +- MiniBatchAssigner(interval=[3000ms], mode=[ProcTime], > changelogMode=[UA,D]) > +- TableSourceScan(table=[[default_catalog, > default_database, source_city]], fields=[id, city_name], changelogMode=[UA,D]) > {code} > We have suggested users to use the same key of the query as the primary key > on sink in the documentation: > https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sql/queries.html#deduplication. > We should make this attention to be more highlight in CREATE TABLE page. -- This message was sent by Atlassian Jira (v8.20.1#820001)