MOBIN created FLINK-36580: ----------------------------- Summary: Using debezium.column.exclude.list causes job failure Key: FLINK-36580 URL: https://issues.apache.org/jira/browse/FLINK-36580 Project: Flink Issue Type: Bug Components: Flink CDC Affects Versions: cdc-3.2.0 Reporter: MOBIN
When the user uses \* (too many columns) when synchronizing data and wants to ignore some privacy columns, the pipeline yml is as follows: {code:java} source: type: mysql .... debezium.column.exclude.list: test_db.test_table.name transform: - source-table: test_db.test_table projection: \* description: project fields and filter {code} an error will be reported: {code:java} Caused by: java.lang.ArrayIndexOutOfBoundsException: 33370909 at org.apache.flink.cdc.common.data.binary.BinarySegmentUtils.getLongMultiSegments(BinarySegmentUtils.java:731) at org.apache.flink.cdc.common.data.binary.BinarySegmentUtils.getLong(BinarySegmentUtils.java:721) at org.apache.flink.cdc.common.data.binary.BinarySegmentUtils.readTimestampData(BinarySegmentUtils.java:1017) at org.apache.flink.cdc.common. data.binary.BinaryRecordData.getTimestamp(BinaryRecordData.java:173) at org.apache.flink.cdc.common.data.RecordData.lambda$createFieldGetter$f6fd429a$1(RecordData.java:214) at org.apache.flink.cdc.common.data.RecordData.lambda$createFieldGetter$79d0aaaa$1(RecordData.java:245) at org.apache.flink.cdc.runtime.operators.transform.PreTransformProcessor.getValueFromBinaryRecordData(PreTransformProcessor.java:111) at org.apache.flink.cdc.run time.operators.transform.PreTransformProcessor.processFillDataField(PreTransformProcessor.java:90) at org.apache.flink.cdc.runtime.operators.transform.PreTransformOperator.processProjection(PreTransformOperator.java:427) at org.apache.flink.cdc.runtime.operators.transform.PreTransformOperator.processDataChangeEvent(PreTransformOperator.java:399) at org.apache.flink.cdc.runtime.operators.transform.PreTransformOperator.processElement(PreTransformOperator.java:251) at org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushTo Operator(CopyingChainingOutput.java:75) at org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:50) at org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:29) at org.apache.flink.streaming.runtime.tasks.SourceOperatorStreamTask$AsyncDataOutputToOutput.emitRecord(SourceOperatorStreamTask.java:310) at org.apache.flink.streaming.api.operators.source.SourceOutputWithWatermarks.collect(SourceOutputWithWatermarks.java:110) {code} Cause of error: debezium will ignore the debezium.column.exclude.list field when collecting data, but cdc does not take this into account when obtaining the table schema, resulting in the inconsistency between the number of columns in tableChangeInfo.getSourceSchema() and the number of columns in BinaryRecordData Temporary solution: projection: \*, cast(null as STRING) as name -- This message was sent by Atlassian Jira (v8.20.10#820010)