Jie Zhang created FLINK-28249:
---------------------------------

             Summary: Flink can not delete older checkpoint folders from google 
storage bucket
                 Key: FLINK-28249
                 URL: https://issues.apache.org/jira/browse/FLINK-28249
             Project: Flink
          Issue Type: Bug
          Components: FileSystems
    Affects Versions: 1.12.0
            Reporter: Jie Zhang


We are running flink 1.12 with this config: 
https://nightlies.apache.org/flink/flink-docs-release-1.12/deployment/filesystems/gcs.html#libraries

 

It is able to checkpoint to google storage bucket, but it can NOT delete older 
checkpoint folders from google storage bucket.

 

logs:
{code:java}
2022-06-22 23:14:28,477 WARN  
org.apache.flink.runtime.checkpoint.CheckpointSubsumeHelper  [] - Fail to 
subsume the old checkpoint.
java.io.IOException: Error deleting 
'gs://flink-checkpoint-dev-bucket/device-logs/flink-checkpoint/822475f36e7a9a4f4048cb82791c55e2/chk-1/_metadata',
 stage 2 with generation 1655939488397964
    at 
com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl$4.onFailure(GoogleCloudStorageImpl.java:937)
 ~[gcs-connector-latest-hadoop2.jar:?]
    at 
com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.BatchHelper.execute(BatchHelper.java:184)
 ~[gcs-connector-latest-hadoop2.jar:?]
    at 
com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.BatchHelper.lambda$queue$0(BatchHelper.java:164)
 ~[gcs-connector-latest-hadoop2.jar:?]
    at 
com.google.cloud.hadoop.repackaged.gcs.com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131)
 ~[gcs-connector-latest-hadoop2.jar:?]
    at 
com.google.cloud.hadoop.repackaged.gcs.com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:74)
 ~[gcs-connector-latest-hadoop2.jar:?]
    at 
com.google.cloud.hadoop.repackaged.gcs.com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82)
 ~[gcs-connector-latest-hadoop2.jar:?]
    at 
com.google.cloud.hadoop.repackaged.gcs.com.google.common.util.concurrent.MoreExecutors$DirectExecutorService.execute(MoreExecutors.java:323)
 ~[gcs-connector-latest-hadoop2.jar:?]
    at 
java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:134)
 ~[?:1.8.0_332]
    at 
com.google.cloud.hadoop.repackaged.gcs.com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:69)
 ~[gcs-connector-latest-hadoop2.jar:?]
    at 
com.google.cloud.hadoop.repackaged.gcs.com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:36)
 ~[gcs-connector-latest-hadoop2.jar:?]
    at 
com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.BatchHelper.queue(BatchHelper.java:162)
 ~[gcs-connector-latest-hadoop2.jar:?]
    at 
com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl.queueSingleObjectDelete(GoogleCloudStorageImpl.java:960)
 ~[gcs-connector-latest-hadoop2.jar:?]
    at 
com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl.deleteObjects(GoogleCloudStorageImpl.java:891)
 ~[gcs-connector-latest-hadoop2.jar:?]
    at 
com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.GoogleCloudStorageFileSystem.deleteInternal(GoogleCloudStorageFileSystem.java:432)
 ~[gcs-connector-latest-hadoop2.jar:?]
    at 
com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.GoogleCloudStorageFileSystem.delete(GoogleCloudStorageFileSystem.java:398)
 ~[gcs-connector-latest-hadoop2.jar:?]
    at 
com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.delete(GoogleHadoopFileSystemBase.java:821)
 ~[gcs-connector-latest-hadoop2.jar:?]
    at 
org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.delete(HadoopFileSystem.java:160)
 ~[flink-dist_2.11-1.12.7.jar:1.12.7]
    at 
org.apache.flink.runtime.state.filesystem.FileStateHandle.discardState(FileStateHandle.java:85)
 ~[flink-dist_2.11-1.12.7.jar:1.12.7]
    at 
org.apache.flink.runtime.checkpoint.CompletedCheckpoint.discard(CompletedCheckpoint.java:249)
 ~[flink-dist_2.11-1.12.7.jar:1.12.7]
    at 
org.apache.flink.runtime.checkpoint.CompletedCheckpoint.discardOnSubsume(CompletedCheckpoint.java:220)
 ~[flink-dist_2.11-1.12.7.jar:1.12.7]
    at 
org.apache.flink.runtime.checkpoint.CheckpointSubsumeHelper.subsume(CheckpointSubsumeHelper.java:63)
 ~[flink-dist_2.11-1.12.7.jar:1.12.7]
    at 
org.apache.flink.runtime.checkpoint.StandaloneCompletedCheckpointStore.addCheckpoint(StandaloneCompletedCheckpointStore.java:73)
 ~[flink-dist_2.11-1.12.7.jar:1.12.7]
    at 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator.completePendingCheckpoint(CheckpointCoordinator.java:1211)
 ~[flink-dist_2.11-1.12.7.jar:1.12.7]
    at 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator.receiveAcknowledgeMessage(CheckpointCoordinator.java:1082)
 ~[flink-dist_2.11-1.12.7.jar:1.12.7]
    at 
org.apache.flink.runtime.scheduler.SchedulerBase.lambda$acknowledgeCheckpoint$7(SchedulerBase.java:1042)
 ~[flink-dist_2.11-1.12.7.jar:1.12.7]
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
[?:1.8.0_332]
    at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_332]
    at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
 [?:1.8.0_332]
    at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
 [?:1.8.0_332]
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[?:1.8.0_332]
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[?:1.8.0_332]
    at java.lang.Thread.run(Thread.java:750) [?:1.8.0_332]
    Suppressed: java.nio.file.DirectoryNotEmptyException: Cannot delete a 
non-empty directory.
        at 
com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.GoogleCloudStorageFileSystem.delete(GoogleCloudStorageFileSystem.java:387)
 ~[gcs-connector-latest-hadoop2.jar:?]
        at 
com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.delete(GoogleHadoopFileSystemBase.java:821)
 ~[gcs-connector-latest-hadoop2.jar:?]
        at 
org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.delete(HadoopFileSystem.java:160)
 ~[flink-dist_2.11-1.12.7.jar:1.12.7]
        at 
org.apache.flink.runtime.state.filesystem.FsCompletedCheckpointStorageLocation.disposeStorageLocation(FsCompletedCheckpointStorageLocation.java:74)
 ~[flink-dist_2.11-1.12.7.jar:1.12.7]
        at 
org.apache.flink.runtime.checkpoint.CompletedCheckpoint.discard(CompletedCheckpoint.java:263)
 ~[flink-dist_2.11-1.12.7.jar:1.12.7]
        at 
org.apache.flink.runtime.checkpoint.CompletedCheckpoint.discardOnSubsume(CompletedCheckpoint.java:220)
 ~[flink-dist_2.11-1.12.7.jar:1.12.7]
        at 
org.apache.flink.runtime.checkpoint.CheckpointSubsumeHelper.subsume(CheckpointSubsumeHelper.java:63)
 ~[flink-dist_2.11-1.12.7.jar:1.12.7]
        at 
org.apache.flink.runtime.checkpoint.StandaloneCompletedCheckpointStore.addCheckpoint(StandaloneCompletedCheckpointStore.java:73)
 ~[flink-dist_2.11-1.12.7.jar:1.12.7]
        at 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator.completePendingCheckpoint(CheckpointCoordinator.java:1211)
 ~[flink-dist_2.11-1.12.7.jar:1.12.7]
        at 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator.receiveAcknowledgeMessage(CheckpointCoordinator.java:1082)
 ~[flink-dist_2.11-1.12.7.jar:1.12.7]
        at 
org.apache.flink.runtime.scheduler.SchedulerBase.lambda$acknowledgeCheckpoint$7(SchedulerBase.java:1042)
 ~[flink-dist_2.11-1.12.7.jar:1.12.7]
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
[?:1.8.0_332]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
[?:1.8.0_332]
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
 [?:1.8.0_332]
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
 [?:1.8.0_332]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[?:1.8.0_332]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[?:1.8.0_332]
        at java.lang.Thread.run(Thread.java:750) [?:1.8.0_332]
Caused by: 
com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.json.GoogleJsonResponseException
    at 
com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.GoogleCloudStorageExceptions.createJsonResponseException(GoogleCloudStorageExceptions.java:89)
 ~[gcs-connector-latest-hadoop2.jar:?]
    at 
com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl$4.onFailure(GoogleCloudStorageImpl.java:917)
 ~[gcs-connector-latest-hadoop2.jar:?]
    ... 31 more
2022-06-22 23:14:28,487 INFO  
org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] - Completed 
checkpoint 4 for job 822475f36e7a9a4f4048cb82791c55e2 (46652 bytes in 527 ms). 
{code}
 

 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to