[ 
https://issues.apache.org/jira/browse/FLINK-36517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

JunboWang updated FLINK-36517:
------------------------------
    Description: 
Same as https://issues.apache.org/jira/browse/FLINK-35938.

When job restart from failure. This may cause the same datafile were added 
twice in  PaimonCommitter.

 
{code:java}
// code placeholder
2024-09-26 19:50:04,623 INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Sink Writer: 
cupid_dev (1/1) 
(44d8fba307e7ff3403a94f71507d4523_26351f8267c5887c12c827914f3a91a9_0_1) 
switched from RUNNING to FAILED on 
container_e47_1710209318993_56454712_01_000002 @ xxxxxxxxxx 
(dataPort=41512).java.io.IOException: java.lang.IllegalStateException: Trying 
to add file {org.apache.paimon.data.BinaryRow@9c67b85d, 6, 0, 
data-f6d7562e-7995-464e-909d-aff0c19b244b-359.parquet} which is already added.  
      at 
org.apache.flink.cdc.connectors.paimon.sink.v2.PaimonWriter.write(PaimonWriter.java:138)
 ~[flink-cdc-pipeline-connector-paimon.jar:3.2.0]        at 
org.apache.flink.streaming.runtime.operators.sink.SinkWriterOperator.processElement(SinkWriterOperator.java:161)
 ~[flink-dist-1.18.0.jar:1.18.0]        at 
org.apache.flink.cdc.runtime.operators.sink.DataSinkWriterOperator.processElement(DataSinkWriterOperator.java:177)
 ~[flink-cdc-dist.jar:3.2.0]        at 
org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:237)
 ~[flink-dist-1.18.0.jar:1.18.0]        at 
org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.processElement(AbstractStreamTaskNetworkInput.java:146)
 ~[flink-dist-1.18.0.jar:1.18.0]        at 
org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.emitNext(AbstractStreamTaskNetworkInput.java:110)
 ~[flink-dist-1.18.0.jar:1.18.0]        at 
org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65)
 ~[flink-dist-1.18.0.jar:1.18.0]        at 
org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:562)
 ~[flink-dist-1.18.0.jar:1.18.0]        at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:231)
 ~[flink-dist-1.18.0.jar:1.18.0]        at 
org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:858)
 ~[flink-dist-1.18.0.jar:1.18.0]        at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:807) 
~[flink-dist-1.18.0.jar:1.18.0]        at 
org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:971)
 ~[flink-dist-1.18.0.jar:1.18.0]        at 
org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:950) 
~[flink-dist-1.18.0.jar:1.18.0]        at 
org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:764) 
~[flink-dist-1.18.0.jar:1.18.0]        at 
org.apache.flink.runtime.taskmanager.Task.run(Task.java:566) 
~[flink-dist-1.18.0.jar:1.18.0]        at java.lang.Thread.run(Thread.java:748) 
~[?:1.8.0_202]Caused by: java.lang.IllegalStateException: Trying to add file 
{org.apache.paimon.data.BinaryRow@9c67b85d, 6, 0, 
data-f6d7562e-7995-464e-909d-aff0c19b244b-359.parquet} which is already added.  
      at 
org.apache.paimon.utils.Preconditions.checkState(Preconditions.java:204) 
~[flink-cdc-pipeline-connector-paimon.jar:3.2.0]        at 
org.apache.paimon.manifest.FileEntry.mergeEntries(FileEntry.java:130) 
~[flink-cdc-pipeline-connector-paimon.jar:3.2.0]        at 
org.apache.paimon.manifest.FileEntry.mergeEntries(FileEntry.java:112) 
~[flink-cdc-pipeline-connector-paimon.jar:3.2.0]        at 
org.apache.paimon.operation.AbstractFileStoreScan.readAndMergeFileEntries(AbstractFileStoreScan.java:391)
 ~[flink-cdc-pipeline-connector-paimon.jar:3.2.0]        at 
org.apache.paimon.operation.AbstractFileStoreScan.doPlan(AbstractFileStoreScan.java:295)
 ~[flink-cdc-pipeline-connector-paimon.jar:3.2.0]        at 
org.apache.paimon.operation.AbstractFileStoreScan.plan(AbstractFileStoreScan.java:219)
 ~[flink-cdc-pipeline-connector-paimon.jar:3.2.0]        at 
org.apache.paimon.operation.AbstractFileStoreWrite.scanExistingFileMetas(AbstractFileStoreWrite.java:425)
 ~[flink-cdc-pipeline-connector-paimon.jar:3.2.0]        at 
org.apache.paimon.operation.AbstractFileStoreWrite.createWriterContainer(AbstractFileStoreWrite.java:384)
 ~[flink-cdc-pipeline-connector-paimon.jar:3.2.0]        at 
org.apache.paimon.operation.AbstractFileStoreWrite.lambda$getWriterWrapper$2(AbstractFileStoreWrite.java:359)
 ~[flink-cdc-pipeline-connector-paimon.jar:3.2.0]        at 
java.util.HashMap.computeIfAbsent(HashMap.java:1127) ~[?:1.8.0_202]        at 
org.apache.paimon.operation.AbstractFileStoreWrite.getWriterWrapper(AbstractFileStoreWrite.java:358)
 ~[flink-cdc-pipeline-connector-paimon.jar:3.2.0]        at 
org.apache.paimon.operation.AbstractFileStoreWrite.write(AbstractFileStoreWrite.java:127)
 ~[flink-cdc-pipeline-connector-paimon.jar:3.2.0]        at 
org.apache.paimon.table.sink.TableWriteImpl.writeAndReturn(TableWriteImpl.java:151)
 ~[flink-cdc-pipeline-connector-paimon.jar:3.2.0]        at 
org.apache.flink.cdc.connectors.paimon.sink.v2.StoreSinkWriteImpl.write(StoreSinkWriteImpl.java:151)
 ~[flink-cdc-pipeline-connector-paimon.jar:3.2.0]        at 
org.apache.flink.cdc.connectors.paimon.sink.v2.PaimonWriter.write(PaimonWriter.java:136)
 ~[flink-cdc-pipeline-connector-paimon.jar:3.2.0]        ... 15 more{code}
 

 

  was:
Same as https://issues.apache.org/jira/browse/FLINK-35938.

When job restart from failure. This may cause the same datafile were added 
twice in  PaimonCommitter.

 
{code:java}
// code placeholder
2024-09-26 19:50:04,623 INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Sink Writer: 
cupid_dev (1/1) 
(44d8fba307e7ff3403a94f71507d4523_26351f8267c5887c12c827914f3a91a9_0_1) 
switched from RUNNING to FAILED on 
container_e47_1710209318993_56454712_01_000002 @ xxxxxxxxxx 
(dataPort=41512).java.io.IOException: java.lang.IllegalStateException: Trying 
to add file {org.apache.paimon.data.BinaryRow@9c67b85d, 6, 0, 
data-f6d7562e-7995-464e-909d-aff0c19b244b-359.parquet} which is already added.  
      at 
org.apache.flink.cdc.connectors.paimon.sink.v2.PaimonWriter.write(PaimonWriter.java:138)
 ~[flink-cdc-pipeline-connector-paimon.jar:3.2.0]        at 
org.apache.flink.streaming.runtime.operators.sink.SinkWriterOperator.processElement(SinkWriterOperator.java:161)
 ~[flink-dist-1.18.0.jar:1.18.0]        at 
org.apache.flink.cdc.runtime.operators.sink.DataSinkWriterOperator.processElement(DataSinkWriterOperator.java:177)
 ~[flink-cdc-dist.jar:3.2.0]        at 
org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:237)
 ~[flink-dist-1.18.0.jar:1.18.0]        at 
org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.processElement(AbstractStreamTaskNetworkInput.java:146)
 ~[flink-dist-1.18.0.jar:1.18.0]        at 
org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.emitNext(AbstractStreamTaskNetworkInput.java:110)
 ~[flink-dist-1.18.0.jar:1.18.0]        at 
org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65)
 ~[flink-dist-1.18.0.jar:1.18.0]        at 
org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:562)
 ~[flink-dist-1.18.0.jar:1.18.0]        at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:231)
 ~[flink-dist-1.18.0.jar:1.18.0]        at 
org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:858)
 ~[flink-dist-1.18.0.jar:1.18.0]        at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:807) 
~[flink-dist-1.18.0.jar:1.18.0]        at 
org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:971)
 ~[flink-dist-1.18.0.jar:1.18.0]        at 
org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:950) 
~[flink-dist-1.18.0.jar:1.18.0]        at 
org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:764) 
~[flink-dist-1.18.0.jar:1.18.0]        at 
org.apache.flink.runtime.taskmanager.Task.run(Task.java:566) 
~[flink-dist-1.18.0.jar:1.18.0]        at java.lang.Thread.run(Thread.java:748) 
~[?:1.8.0_202]Caused by: java.lang.IllegalStateException: Trying to add file 
{org.apache.paimon.data.BinaryRow@9c67b85d, 6, 0, 
data-f6d7562e-7995-464e-909d-aff0c19b244b-359.parquet} which is already added.  
      at 
org.apache.paimon.utils.Preconditions.checkState(Preconditions.java:204) 
~[flink-cdc-pipeline-connector-paimon.jar:3.2.0]        at 
org.apache.paimon.manifest.FileEntry.mergeEntries(FileEntry.java:130) 
~[flink-cdc-pipeline-connector-paimon.jar:3.2.0]        at 
org.apache.paimon.manifest.FileEntry.mergeEntries(FileEntry.java:112) 
~[flink-cdc-pipeline-connector-paimon.jar:3.2.0]        at 
org.apache.paimon.operation.AbstractFileStoreScan.readAndMergeFileEntries(AbstractFileStoreScan.java:391)
 ~[flink-cdc-pipeline-connector-paimon.jar:3.2.0]        at 
org.apache.paimon.operation.AbstractFileStoreScan.doPlan(AbstractFileStoreScan.java:295)
 ~[flink-cdc-pipeline-connector-paimon.jar:3.2.0]        at 
org.apache.paimon.operation.AbstractFileStoreScan.plan(AbstractFileStoreScan.java:219)
 ~[flink-cdc-pipeline-connector-paimon.jar:3.2.0]        at 
org.apache.paimon.operation.AbstractFileStoreWrite.scanExistingFileMetas(AbstractFileStoreWrite.java:425)
 ~[flink-cdc-pipeline-connector-paimon.jar:3.2.0]        at 
org.apache.paimon.operation.AbstractFileStoreWrite.createWriterContainer(AbstractFileStoreWrite.java:384)
 ~[flink-cdc-pipeline-connector-paimon.jar:3.2.0]        at 
org.apache.paimon.operation.AbstractFileStoreWrite.lambda$getWriterWrapper$2(AbstractFileStoreWrite.java:359)
 ~[flink-cdc-pipeline-connector-paimon.jar:3.2.0]        at 
java.util.HashMap.computeIfAbsent(HashMap.java:1127) ~[?:1.8.0_202]        at 
org.apache.paimon.operation.AbstractFileStoreWrite.getWriterWrapper(AbstractFileStoreWrite.java:358)
 ~[flink-cdc-pipeline-connector-paimon.jar:3.2.0]        at 
org.apache.paimon.operation.AbstractFileStoreWrite.write(AbstractFileStoreWrite.java:127)
 ~[flink-cdc-pipeline-connector-paimon.jar:3.2.0]        at 
org.apache.paimon.table.sink.TableWriteImpl.writeAndReturn(TableWriteImpl.java:151)
 ~[flink-cdc-pipeline-connector-paimon.jar:3.2.0]        at 
org.apache.flink.cdc.connectors.paimon.sink.v2.StoreSinkWriteImpl.write(StoreSinkWriteImpl.java:151)
 ~[flink-cdc-pipeline-connector-paimon.jar:3.2.0]        at 
org.apache.flink.cdc.connectors.paimon.sink.v2.PaimonWriter.write(PaimonWriter.java:136)
 ~[flink-cdc-pipeline-connector-paimon.jar:3.2.0]        ... 15 more {code}
 

 


> Duplicate commit the same datafile in Paimon Sink
> -------------------------------------------------
>
>                 Key: FLINK-36517
>                 URL: https://issues.apache.org/jira/browse/FLINK-36517
>             Project: Flink
>          Issue Type: Bug
>          Components: Flink CDC
>    Affects Versions: cdc-3.2.0
>         Environment: flink-1.18.0
> flink cdc 3.2.0
>            Reporter: JunboWang
>            Priority: Critical
>
> Same as https://issues.apache.org/jira/browse/FLINK-35938.
> When job restart from failure. This may cause the same datafile were added 
> twice in  PaimonCommitter.
>  
> {code:java}
> // code placeholder
> 2024-09-26 19:50:04,623 INFO  
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Sink 
> Writer: cupid_dev (1/1) 
> (44d8fba307e7ff3403a94f71507d4523_26351f8267c5887c12c827914f3a91a9_0_1) 
> switched from RUNNING to FAILED on 
> container_e47_1710209318993_56454712_01_000002 @ xxxxxxxxxx 
> (dataPort=41512).java.io.IOException: java.lang.IllegalStateException: Trying 
> to add file {org.apache.paimon.data.BinaryRow@9c67b85d, 6, 0, 
> data-f6d7562e-7995-464e-909d-aff0c19b244b-359.parquet} which is already 
> added.        at 
> org.apache.flink.cdc.connectors.paimon.sink.v2.PaimonWriter.write(PaimonWriter.java:138)
>  ~[flink-cdc-pipeline-connector-paimon.jar:3.2.0]        at 
> org.apache.flink.streaming.runtime.operators.sink.SinkWriterOperator.processElement(SinkWriterOperator.java:161)
>  ~[flink-dist-1.18.0.jar:1.18.0]        at 
> org.apache.flink.cdc.runtime.operators.sink.DataSinkWriterOperator.processElement(DataSinkWriterOperator.java:177)
>  ~[flink-cdc-dist.jar:3.2.0]        at 
> org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:237)
>  ~[flink-dist-1.18.0.jar:1.18.0]        at 
> org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.processElement(AbstractStreamTaskNetworkInput.java:146)
>  ~[flink-dist-1.18.0.jar:1.18.0]        at 
> org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.emitNext(AbstractStreamTaskNetworkInput.java:110)
>  ~[flink-dist-1.18.0.jar:1.18.0]        at 
> org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65)
>  ~[flink-dist-1.18.0.jar:1.18.0]        at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:562)
>  ~[flink-dist-1.18.0.jar:1.18.0]        at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:231)
>  ~[flink-dist-1.18.0.jar:1.18.0]        at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:858)
>  ~[flink-dist-1.18.0.jar:1.18.0]        at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:807)
>  ~[flink-dist-1.18.0.jar:1.18.0]        at 
> org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:971)
>  ~[flink-dist-1.18.0.jar:1.18.0]        at 
> org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:950) 
> ~[flink-dist-1.18.0.jar:1.18.0]        at 
> org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:764) 
> ~[flink-dist-1.18.0.jar:1.18.0]        at 
> org.apache.flink.runtime.taskmanager.Task.run(Task.java:566) 
> ~[flink-dist-1.18.0.jar:1.18.0]        at 
> java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_202]Caused by: 
> java.lang.IllegalStateException: Trying to add file 
> {org.apache.paimon.data.BinaryRow@9c67b85d, 6, 0, 
> data-f6d7562e-7995-464e-909d-aff0c19b244b-359.parquet} which is already 
> added.        at 
> org.apache.paimon.utils.Preconditions.checkState(Preconditions.java:204) 
> ~[flink-cdc-pipeline-connector-paimon.jar:3.2.0]        at 
> org.apache.paimon.manifest.FileEntry.mergeEntries(FileEntry.java:130) 
> ~[flink-cdc-pipeline-connector-paimon.jar:3.2.0]        at 
> org.apache.paimon.manifest.FileEntry.mergeEntries(FileEntry.java:112) 
> ~[flink-cdc-pipeline-connector-paimon.jar:3.2.0]        at 
> org.apache.paimon.operation.AbstractFileStoreScan.readAndMergeFileEntries(AbstractFileStoreScan.java:391)
>  ~[flink-cdc-pipeline-connector-paimon.jar:3.2.0]        at 
> org.apache.paimon.operation.AbstractFileStoreScan.doPlan(AbstractFileStoreScan.java:295)
>  ~[flink-cdc-pipeline-connector-paimon.jar:3.2.0]        at 
> org.apache.paimon.operation.AbstractFileStoreScan.plan(AbstractFileStoreScan.java:219)
>  ~[flink-cdc-pipeline-connector-paimon.jar:3.2.0]        at 
> org.apache.paimon.operation.AbstractFileStoreWrite.scanExistingFileMetas(AbstractFileStoreWrite.java:425)
>  ~[flink-cdc-pipeline-connector-paimon.jar:3.2.0]        at 
> org.apache.paimon.operation.AbstractFileStoreWrite.createWriterContainer(AbstractFileStoreWrite.java:384)
>  ~[flink-cdc-pipeline-connector-paimon.jar:3.2.0]        at 
> org.apache.paimon.operation.AbstractFileStoreWrite.lambda$getWriterWrapper$2(AbstractFileStoreWrite.java:359)
>  ~[flink-cdc-pipeline-connector-paimon.jar:3.2.0]        at 
> java.util.HashMap.computeIfAbsent(HashMap.java:1127) ~[?:1.8.0_202]        at 
> org.apache.paimon.operation.AbstractFileStoreWrite.getWriterWrapper(AbstractFileStoreWrite.java:358)
>  ~[flink-cdc-pipeline-connector-paimon.jar:3.2.0]        at 
> org.apache.paimon.operation.AbstractFileStoreWrite.write(AbstractFileStoreWrite.java:127)
>  ~[flink-cdc-pipeline-connector-paimon.jar:3.2.0]        at 
> org.apache.paimon.table.sink.TableWriteImpl.writeAndReturn(TableWriteImpl.java:151)
>  ~[flink-cdc-pipeline-connector-paimon.jar:3.2.0]        at 
> org.apache.flink.cdc.connectors.paimon.sink.v2.StoreSinkWriteImpl.write(StoreSinkWriteImpl.java:151)
>  ~[flink-cdc-pipeline-connector-paimon.jar:3.2.0]        at 
> org.apache.flink.cdc.connectors.paimon.sink.v2.PaimonWriter.write(PaimonWriter.java:136)
>  ~[flink-cdc-pipeline-connector-paimon.jar:3.2.0]        ... 15 more{code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to