Qian Chao created FLINK-21437: --------------------------------- Summary: Memory leak when using filesystem state backend on Alibaba Cloud OSS Key: FLINK-21437 URL: https://issues.apache.org/jira/browse/FLINK-21437 Project: Flink Issue Type: Bug Components: Runtime / State Backends Reporter: Qian Chao
When using filesystem state backend, and storing checkpoints on Alibaba Cloud OSS flink-conf.yaml: {code:java} state.backend: filesystem state.checkpoints.dir: oss://yourBucket/checkpoints fs.oss.endpoint: xxxxx fs.oss.accessKeyId: xxxxx fs.oss.accessKeySecret: xxxxx{code} A memory leak (both jobmanager and taskmanager) would occur after a period of time, objects retained in jvm heap like: {code:java} The class "java.io.DeleteOnExitHook", loaded by "<system class loader>", occupies 1,018,323,960 (96.47%) bytes. The memory is accumulated in one instance of "java.util.LinkedHashMap", loaded by "<system class loader>", which occupies 1,018,323,832 (96.47%) bytes. {code} The root cause should be that when using flink-oss-fs-hadoop to upload file to OSS, OSSFileSystem will create temporary file, and deleteOnExit, so LinkedHashSet<String> files in DeleteOnExitHook will get bigger and bigger. {code:java} org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystem::create -> org.apache.hadoop.fs.aliyun.oss.AliyunOSSOutputStream::new -> dirAlloc.createTmpFileForWrite("output-", -1L, conf) -> org.apache.hadoop.fs.LocalDirAllocator::createTmpFileForWrite -> result.deleteOnExit() {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)