subject:"Orphaned job files in HDFS"

Re: [E] Re: Orphaned job files in HDFS

2022-01-17 Thread Yang Wang

The clean-up of the staging directory is best effort. If the JobManager crashed and killed externally, then it does not have any chance to do the staging directory clean-up. AFAIK, we do not have such Flink options to guarantee the clean-up. Best, Yang David Clutter 于2022年1月11日周二 22:59写道： > Ok

Re: [E] Re: Orphaned job files in HDFS

2022-01-11 Thread David Clutter

Ok, that makes sense. I did see some job failures. However failures could happen occasionally. Is there any option to have the job manager clean-up these directories when the job has failed? On Mon, Jan 10, 2022 at 8:58 PM Yang Wang wrote: > IIRC, the staging directory(/user/{name}/.flink/app

Re: Orphaned job files in HDFS

2022-01-10 Thread Yang Wang

IIRC, the staging directory(/user/{name}/.flink/application_xxx) will be deleted automatically if the Flink job reaches global terminal state(e.g. FINISHED, CANCELED, FAILED). So I assume you have stopped the yarn application via "yarn application -kill", not via "bin/flink cancel". If it is the ca

Orphaned job files in HDFS

2022-01-10 Thread David Clutter

I'm seeing files orphaned in HDFS and wondering how to clean them up when the job is completed. The directory is /user/yarn/.flink so I am assuming this is created by flink? The HDFS in my cluster eventually fills up. Here is my setup: - Flink 1.13.1 on AWS EMR - Executing flink in per-jo