Yohei Yoshimuta created FLINK-36945:
---------------------------------------
Summary: MySQL CDC internal schema representation becomes out of
sync with the real database schema when restarting a job
Key: FLINK-36945
URL: https://issues.apache.org/jira/browse/FLINK-36945
Project: Flink
Issue Type: Bug
Components: Flink CDC
Reporter: Yohei Yoshimuta
[The Vitess schema migration
tool|https://vitess.io/docs/user-guides/schema-changes/ddl-strategies/] uses
`RENAME TABLE` to perform schema changes.
However, the MySQL CDC connector does not account for these changes, causing
the schema history topic in Debezium to become stale. While this issue does not
immediately affect a running job, it prevents the job from restarting
successfully from a checkpoint and results in the following error:
```
Caused by:
com.ververica.cdc.connectors.shaded.org.apache.kafka.connect.errors.ConnectException:
Data row is smaller than a column index, internal schema representation is
probably out of sync with real database schema
at
io.debezium.relational.TableSchemaBuilder.validateIncomingRowToInternalMetadata(TableSchemaBuilder.java:254)
at
io.debezium.relational.TableSchemaBuilder.lambda$createValueGenerator$5(TableSchemaBuilder.java:283)
at io.debezium.relational.TableSchema.valueFromColumnData(TableSchema.java:141)
at
io.debezium.relational.RelationalChangeRecordEmitter.emitUpdateRecord(RelationalChangeRecordEmitter.java:139)
at
io.debezium.relational.RelationalChangeRecordEmitter.emitChangeRecords(RelationalChangeRecordEmitter.java:60)
at
io.debezium.pipeline.EventDispatcher.dispatchDataChangeEvent(EventDispatcher.java:209)
... 12 more
```
When this happens, the database history topic needs to be rebuilt, but the job
cannot be automatically recovered.
The current workaround is to set `scan.startup.mode` to `specific-offset`,
which forces CDC to pass `schema_only_recovery` to Debezium. However, this
requires manual intervention.
Examples of potential solutions include:
- Enhancing the schema history mechanism to capture and process `RENAME TABLE`
events during schema migrations.
- Implementing a fallback mechanism to reconcile schema differences during
recovery, reducing the need for manual intervention.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)