Hi Yun, Thanks for bringing this into discussion. I'm +1 to this idea. And IIUC, Flink implements the OSS and S3 filesystem based on the hadoop filesystem interface, which does not provide the multi-delete API, it may take some effort to implement this.
Best, Zakelly On Thu, Jun 30, 2022 at 5:36 PM Martijn Visser <[email protected]> wrote: > Hi Yun Tang, > > +1 for addressing this problem and your approach. > > Best regards, > > Martijn > > Op do 30 jun. 2022 om 11:12 schreef Feifan Wang <[email protected]>: > > > Thanks a lot for the proposal @Yun Tang ! It sounds great and I can't > > find any reason not to make this improvement. > > > > > > —————————————— > > Name: Feifan Wang > > Email: [email protected] > > > > > > ---- Replied Message ---- > > | From | Yun Tang<[email protected]> | > > | Date | 06/30/2022 16:56 | > > | To | [email protected]<[email protected]> | > > | Subject | [DISCUSS] Introduce multi delete API to Flink's FileSystem > > class | > > Hi guys, > > > > As more and more teams move to cloud-based environments. Cloud object > > storage has become the factual technical standard for big data > ecosystems. > > From our experience, the performance of writing/deleting objects in > object > > storage could vary in each call, the FLIP of changelog state-backend had > > ever taken experiments to verify the performance of writing the same data > > with multi times [1], and it proves that p999 latency could be 8x than > p50 > > latency. This is also true for delete operations. > > > > Currently, after introducing the checkpoint backpressure mechanism[2], > the > > newly triggered checkpoint could be delayed due to not cleaning > checkpoints > > as fast as possible [3]. > > Moreover, Flink's checkpoint cleanup mechanism cannot leverage deleting > > folder API to speed up the procedure with incremental checkpoints[4]. > > This is extremely obvious in cloud object storage, and all most all > object > > storage SDKs have multi-delete API to accelerate the performance, e.g. > AWS > > S3 [5], Aliyun OSS [6], and Tencentyun COS [7]. > > A simple experiment shows that deleting 1000 objects with each 5MB size, > > will cost 39494ms with for-loop single delete operations, and the result > > will drop to 1347ms if using multi-delete API in Tencent Cloud. > > > > However, Flink's FileSystem API refers to the HDFS's FileSystem API and > > lacks such a multi-delete API, which is somehow outdated currently in > > cloud-based environments. > > Thus I suggest adding such a multi-delete API to Flink's FileSystem[8] > > class and file systems that do not support such a multi-delete feature > will > > roll back to a for-loop single delete. > > By doing so, we can at least accelerate the speed of discarding > > checkpoints in cloud environments. > > > > WDYT? > > > > > > [1] > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-158%3A+Generalized+incremental+checkpoints#FLIP158:Generalizedincrementalcheckpoints-DFSwritelatency > > [2] https://issues.apache.org/jira/browse/FLINK-17073 > > [3] https://issues.apache.org/jira/browse/FLINK-26590 > > [4] > > > https://github.com/apache/flink/blob/1486fee1acd9cd1e340f6d2007f723abd20294e5/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CompletedCheckpoint.java#L315 > > [5] > > > https://docs.aws.amazon.com/AmazonS3/latest/userguide/delete-multiple-objects.html > > [6] > > > https://www.alibabacloud.com/help/en/object-storage-service/latest/delete-objects-8#section-v6n-zym-tax > > [7] > > > https://intl.cloud.tencent.com/document/product/436/44018#delete-objects-in-batch > > [8] > > > https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/core/fs/FileSystem.java > > > > > > Best > > Yun Tang > > > > >
