lm-ylj opened a new issue, #9124:
URL: https://github.com/apache/seatunnel/issues/9124

   ### Search before asking
   
   - [x] I had searched in the 
[issues](https://github.com/apache/seatunnel/issues?q=is%3Aissue+label%3A%22bug%22)
 and found no similar issues.
   
   
   ### What happened
   
   Using COS as checkpoint storage, occasional Checkpoint execution failure 
during synchronization, missing org.apache.hadopop.utl CleanerUtil class
   Based on the error message in the log, I checked `org.apache.hadopop.fscosn 
BufferPool` and found the cause of the error
   ```java
   public final class BufferPool {
       // skip...
   
       public void returnBuffer(ByteBufferWrapper byteBufferWrapper) throws 
InterruptedException, IOException {
           if (null != this.bufferPool && null != byteBufferWrapper) {
               // This scenario
               if (byteBufferWrapper.isDiskBuffer()) {
                   byteBufferWrapper.close();
               } else {
                   ByteBuffer byteBuffer = byteBufferWrapper.getByteBuffer();
                   if (null != byteBuffer) {
                       byteBuffer.clear();
                       LOG.debug("Return the buffer to the buffer pool.");
                       if (!this.bufferPool.offer(byteBuffer)) {
                           LOG.error("Return the buffer to buffer pool 
failed.");
                       }
                   }
               }
           }
       }
   }
   ```
   When all ByteBuffers in BufferPool are in use(default is 4), 
ByteBufferWrapper object will be created based on a temporary file. When the 
checkpoint operation is completed, the close method in ByteBufferWrapper will 
be called, and the munmap method will be called at this time, which will use 
the CleanerUtils class
   ```java
   private void munmap(MappedByteBuffer buffer) {
       if (CleanerUtil.UNMAP_SUPPORTED) {
           try {
               CleanerUtil.getCleaner().freeBuffer(buffer);
           } catch (IOException var3) {
               LOG.warn("Failed to unmap the buffer", var3);
           }
       } else {
           LOG.trace(CleanerUtil.UNMAP_NOT_SUPPORTED_REASON);
       }
   }
   ```
   However, there is no CleanerUtil class in hadoop-common-3.1.4.jar, so it 
will report an error
   
   
   
   ### SeaTunnel Version
   
   2.3.7 and dev
   
   ### SeaTunnel Config
   
   ```conf
   env {
     parallelism = 1
     job.mode = "STREAMING"
     checkpoint.interval = 30000
   }
   source {
     MySQL-CDC {
       base-url = 
"jdbc:mysql://xxx/xxx?useUnicode=true&characterEncoding=UTF-8&zeroDateTimeBehavior=convertToNull&useSSL=false"
       username = "xxx"
       password = "xxx"
       table-names = ["xxx.xxx"]
     }
   }
   sink {
     jdbc {
       url = 
"jdbc:mysql://xxx/xxx?useUnicode=true&characterEncoding=UTF-8&zeroDateTimeBehavior=convertToNull&useSSL=false"
       driver = "com.mysql.cj.jdbc.Driver"
       user = "xxx"
       password = "xxx"
       database = "xxx"
       generate_sink_sql = true
     }
   }
   ```
   
   ### Running Command
   
   ```shell
   bin/seatunnel.sh -c /task-config/test.conf --async -n test
   ```
   
   ### Error Exception
   
   ```log
   2025-03-29 
03:35:10.186:::ent:::prod:::SeaTunnel-Master:::${HOSTNAME}:::ShangHai:::null:::null:::seatunnel-coordinator-service-1206:::INFO:::org.apache.hadoop.fs.cosn.CosNOutputStream:::The
 output stream has been close, and begin to upload the last block: [0].
   2025-03-29 
03:35:10.186:::ent:::prod:::SeaTunnel-Master:::${HOSTNAME}:::ShangHai:::null:::null:::seatunnel-coordinator-service-1206:::INFO:::org.apache.hadoop.fs.cosn.CosNativeFileSystemStore:::Store
 file from input stream. COS key: 
[/seatunnel-prod/949869267593986065/1743190510027-811-1-65308.sertmp], length: 
[6669].
   2025-03-29 
03:35:10.198:::ent:::prod:::SeaTunnel-Master:::${HOSTNAME}:::ShangHai:::null:::null:::seatunnel-coordinator-service-1211:::INFO:::org.apache.hadoop.fs.cosn.CosNOutputStream:::The
 output stream has been close, and begin to upload the last block: [0].
   2025-03-29 
03:35:10.198:::ent:::prod:::SeaTunnel-Master:::${HOSTNAME}:::ShangHai:::null:::null:::seatunnel-coordinator-service-1211:::INFO:::org.apache.hadoop.fs.cosn.CosNativeFileSystemStore:::Store
 file from input stream. COS key: 
[/seatunnel-prod/955031239230750737/1743190510073-920-1-12142.sertmp], length: 
[7959].
   2025-03-29 
03:35:10.198:::ent:::prod:::SeaTunnel-Master:::${HOSTNAME}:::ShangHai:::null:::null:::seatunnel-coordinator-service-1212:::INFO:::org.apache.hadoop.fs.cosn.CosNOutputStream:::The
 outputStream for key: 
[/seatunnel-prod/947403261554458639/1743190509997-275-1-84906.sertmp] has been 
uploaded.
   2025-03-29 
03:35:10.203:::ent:::prod:::SeaTunnel-Master:::${HOSTNAME}:::ShangHai:::null:::null:::seatunnel-coordinator-service-1209:::INFO:::org.apache.hadoop.fs.cosn.CosNOutputStream:::The
 outputStream for key: 
[/seatunnel-prod/947403312368451595/1743190509996-43-1-84905.sertmp] has been 
uploaded.
   2025-03-29 
03:35:10.237:::ent:::prod:::SeaTunnel-Master:::${HOSTNAME}:::ShangHai:::null:::null:::seatunnel-coordinator-service-1197:::INFO:::org.apache.hadoop.fs.cosn.CosNOutputStream:::The
 outputStream for key: 
[/seatunnel-prod/956662281070968847/1743190509953-783-1-5661.sertmp] has been 
uploaded.
   2025-03-29 
03:35:10.249:::ent:::prod:::SeaTunnel-Master:::${HOSTNAME}:::ShangHai:::null:::null:::seatunnel-coordinator-service-1203:::INFO:::org.apache.seatunnel.engine.server.checkpoint.CheckpointCoordinator:::wait
 checkpoint completed: 64434
   2025-03-29 
03:35:10.250:::ent:::prod:::SeaTunnel-Master:::${HOSTNAME}:::ShangHai:::null:::null:::seatunnel-coordinator-service-1206:::INFO:::org.apache.hadoop.fs.cosn.CosNOutputStream:::The
 outputStream for key: 
[/seatunnel-prod/949869267593986065/1743190510027-811-1-65308.sertmp] has been 
uploaded.
   2025-03-29 
03:35:10.258:::ent:::prod:::SeaTunnel-Master:::${HOSTNAME}:::ShangHai:::null:::null:::seatunnel-coordinator-service-1211:::ERROR:::org.apache.seatunnel.engine.server.checkpoint.CheckpointCoordinator:::store
 checkpoint states failed.
   java.lang.NoClassDefFoundError: org/apache/hadoop/util/CleanerUtil
        at 
org.apache.hadoop.fs.cosn.ByteBufferWrapper.munmap(ByteBufferWrapper.java:61) 
~[hadoop-cos-3.4.1.jar:?]
        at 
org.apache.hadoop.fs.cosn.ByteBufferWrapper.close(ByteBufferWrapper.java:89) 
~[hadoop-cos-3.4.1.jar:?]
        at 
org.apache.hadoop.fs.cosn.BufferPool.returnBuffer(BufferPool.java:228) 
~[hadoop-cos-3.4.1.jar:?]
        at 
org.apache.hadoop.fs.cosn.CosNOutputStream.close(CosNOutputStream.java:157) 
~[hadoop-cos-3.4.1.jar:?]
        at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
 ~[seatunnel-hadoop3-3.1.4-uber.jar:2.3.8-SNAPSHOT]
        at 
org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101) 
~[seatunnel-hadoop3-3.1.4-uber.jar:2.3.8-SNAPSHOT]
        at 
org.apache.seatunnel.engine.checkpoint.storage.hdfs.HdfsStorage.storeCheckPoint(HdfsStorage.java:109)
 ~[seatunnel-starter.jar:2.3.8-SNAPSHOT]
        at 
org.apache.seatunnel.engine.server.checkpoint.CheckpointCoordinator.completePendingCheckpoint(CheckpointCoordinator.java:850)
 ~[seatunnel-starter.jar:2.3.8-SNAPSHOT]
        at 
org.apache.seatunnel.engine.server.checkpoint.CheckpointCoordinator.lambda$startTriggerPendingCheckpoint$7(CheckpointCoordinator.java:600)
 ~[seatunnel-starter.jar:2.3.8-SNAPSHOT]
        at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
 ~[?:1.8.0_341]
        at 
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
 ~[?:1.8.0_341]
        at 
java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:456)
 ~[?:1.8.0_341]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
~[?:1.8.0_341]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
~[?:1.8.0_341]
        at java.lang.Thread.run(Thread.java:750) [?:1.8.0_341]
   2025-03-29 
03:35:10.258:::ent:::prod:::SeaTunnel-Master:::${HOSTNAME}:::ShangHai:::null:::null:::seatunnel-coordinator-service-1211:::ERROR:::org.apache.seatunnel.engine.server.checkpoint.CheckpointCoordinator:::complete
 checkpoint failed
   java.lang.ClassCastException: java.lang.NoClassDefFoundError cannot be cast 
to java.lang.Exception
        at 
org.apache.seatunnel.engine.common.utils.ExceptionUtil.sneakyThrow(ExceptionUtil.java:120)
 ~[seatunnel-starter.jar:2.3.8-SNAPSHOT]
        at 
org.apache.seatunnel.engine.server.checkpoint.CheckpointCoordinator.completePendingCheckpoint(CheckpointCoordinator.java:877)
 ~[seatunnel-starter.jar:2.3.8-SNAPSHOT]
        at 
org.apache.seatunnel.engine.server.checkpoint.CheckpointCoordinator.lambda$startTriggerPendingCheckpoint$7(CheckpointCoordinator.java:600)
 ~[seatunnel-starter.jar:2.3.8-SNAPSHOT]
        at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
 ~[?:1.8.0_341]
        at 
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
 ~[?:1.8.0_341]
        at 
java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:456)
 ~[?:1.8.0_341]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
~[?:1.8.0_341]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
~[?:1.8.0_341]
        at java.lang.Thread.run(Thread.java:750) [?:1.8.0_341]
   2025-03-29 
03:35:10.258:::ent:::prod:::SeaTunnel-Master:::${HOSTNAME}:::ShangHai:::null:::null:::seatunnel-coordinator-service-1211:::INFO:::org.apache.seatunnel.engine.server.checkpoint.CheckpointCoordinator:::start
 clean pending checkpoint cause CheckpointCoordinator inside have error.
   2025-03-29 
03:35:10.259:::ent:::prod:::SeaTunnel-Master:::${HOSTNAME}:::ShangHai:::null:::null:::seatunnel-coordinator-service-1211:::INFO:::org.apache.seatunnel.engine.server.checkpoint.CheckpointCoordinator:::Turn
 checkpoint_state_955031239230750737_1 state from RUNNING to FAILED
   2025-03-29 
03:35:10.260:::ent:::prod:::SeaTunnel-Master:::${HOSTNAME}:::ShangHai:::null:::null:::seatunnel-coordinator-service-1211:::WARN:::org.apache.seatunnel.engine.server.dag.physical.SubPlan:::Job
 globalsh-mysql_pay_error_order.conf (955031239230750737), Pipeline: [(1/1)] 
checkpoint have error, cancel the pipeline
   2025-03-29 
03:35:10.261:::ent:::prod:::SeaTunnel-Master:::${HOSTNAME}:::ShangHai:::null:::null:::seatunnel-coordinator-service-1211:::INFO:::org.apache.seatunnel.engine.server.dag.physical.SubPlan:::Job
 globalsh-mysql_pay_error_order.conf (955031239230750737), Pipeline: [(1/1)] 
turned from state RUNNING to CANCELING.
   2025-03-29 
03:35:10.261:::ent:::prod:::SeaTunnel-Master:::${HOSTNAME}:::ShangHai:::null:::null:::seatunnel-coordinator-service-1211:::INFO:::org.apache.seatunnel.engine.server.dag.physical.PhysicalVertex:::Job
 globalsh-mysql_pay_error_order.conf (955031239230750737), Pipeline: [(1/1)], 
task: [pipeline-1 [Source[0]-MySQL-CDC]-SplitEnumerator (1/1)] state process is 
start
   2025-03-29 
03:35:10.262:::ent:::prod:::SeaTunnel-Master:::${HOSTNAME}:::ShangHai:::null:::null:::seatunnel-coordinator-service-1211:::INFO:::org.apache.seatunnel.engine.server.dag.physical.PhysicalVertex:::Job
 globalsh-mysql_pay_error_order.conf (955031239230750737), Pipeline: [(1/1)], 
task: [pipeline-1 [Source[0]-MySQL-CDC]-SplitEnumerator (1/1)] turned from 
state RUNNING to CANCELING.
   2025-03-29 
03:35:10.263:::ent:::prod:::SeaTunnel-Master:::${HOSTNAME}:::ShangHai:::null:::null:::seatunnel-coordinator-service-1211:::INFO:::org.apache.seatunnel.engine.server.dag.physical.PhysicalVertex:::Send
 cancel Job globalsh-mysql_pay_error_order.conf (955031239230750737), Pipeline: 
[(1/1)], task: [pipeline-1 [Source[0]-MySQL-CDC]-SplitEnumerator (1/1)] 
operator to member [worker-02.prod.seatunnel.sh.tx.dbt.com]:5802
   ```
   
   ### Zeta or Flink or Spark Version
   
   _No response_
   
   ### Java or Scala Version
   
   java version "1.8.0_341"
   Java(TM) SE Runtime Environment (build 1.8.0_341-b10)
   Java HotSpot(TM) 64-Bit Server VM (build 25.341-b10, mixed mode)
   
   ### Screenshots
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [x] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to