MOBIN created FLINK-36580:
-----------------------------

             Summary: Using debezium.column.exclude.list causes job failure
                 Key: FLINK-36580
                 URL: https://issues.apache.org/jira/browse/FLINK-36580
             Project: Flink
          Issue Type: Bug
          Components: Flink CDC
    Affects Versions: cdc-3.2.0
            Reporter: MOBIN


When the user uses \* (too many columns) when synchronizing data and wants to 
ignore some privacy columns, the pipeline yml is as follows:
{code:java}
source:
  type: mysql
  ....
  debezium.column.exclude.list: test_db.test_table.name

transform:
  - source-table: test_db.test_table
  projection: \*
  description: project fields and filter {code}
an error will be reported:
{code:java}
Caused by: java.lang.ArrayIndexOutOfBoundsException: 33370909
at 
org.apache.flink.cdc.common.data.binary.BinarySegmentUtils.getLongMultiSegments(BinarySegmentUtils.java:731)
at 
org.apache.flink.cdc.common.data.binary.BinarySegmentUtils.getLong(BinarySegmentUtils.java:721)
 
at 
org.apache.flink.cdc.common.data.binary.BinarySegmentUtils.readTimestampData(BinarySegmentUtils.java:1017)
 
at org.apache.flink.cdc.common. 
data.binary.BinaryRecordData.getTimestamp(BinaryRecordData.java:173) 
at 
org.apache.flink.cdc.common.data.RecordData.lambda$createFieldGetter$f6fd429a$1(RecordData.java:214)
 
at 
org.apache.flink.cdc.common.data.RecordData.lambda$createFieldGetter$79d0aaaa$1(RecordData.java:245)
 
at 
org.apache.flink.cdc.runtime.operators.transform.PreTransformProcessor.getValueFromBinaryRecordData(PreTransformProcessor.java:111)
 
at org.apache.flink.cdc.run 
time.operators.transform.PreTransformProcessor.processFillDataField(PreTransformProcessor.java:90)
 
at 
org.apache.flink.cdc.runtime.operators.transform.PreTransformOperator.processProjection(PreTransformOperator.java:427)
 
at 
org.apache.flink.cdc.runtime.operators.transform.PreTransformOperator.processDataChangeEvent(PreTransformOperator.java:399)
 
at 
org.apache.flink.cdc.runtime.operators.transform.PreTransformOperator.processElement(PreTransformOperator.java:251)
 
at org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushTo 
Operator(CopyingChainingOutput.java:75) 
at 
org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:50)
 
at 
org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:29)
 
at 
org.apache.flink.streaming.runtime.tasks.SourceOperatorStreamTask$AsyncDataOutputToOutput.emitRecord(SourceOperatorStreamTask.java:310)
at 
org.apache.flink.streaming.api.operators.source.SourceOutputWithWatermarks.collect(SourceOutputWithWatermarks.java:110)
 {code}
Cause of error: debezium will ignore the debezium.column.exclude.list field 
when collecting data, but cdc does not take this into account when obtaining 
the table schema, resulting in the inconsistency between the number of columns 
in tableChangeInfo.getSourceSchema() and the number of columns in 
BinaryRecordData

Temporary solution: projection: \*, cast(null as STRING) as name



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to