Hi Theo,

your assumption is correct that Flink won't clean up its files when using
`yarn application -kill ID`. This should also hold true for other temporary
files generated by Flink's Blob service, shuffle service and io manager.
These files are usually stored under /tmp and should be cleaned up
eventually, though.

I think a better approach is to reconnect to the Flink Yarn session cluster
and then issue the "stop" command. You can either do it via
`bin/yarn-session.sh -id APP_ID` and then type "stop" or you do `echo
"stop" | bin/yarn-session.sh -id APP_ID`.

I think we should also update the logging statements of the yarn-session.sh
which say that you should use `yarn application -kill` in order to stop the
process.

Cheers,
Till

On Tue, Jan 28, 2020 at 6:21 PM Theo Diefenthal <
theo.diefent...@scoop-software.de> wrote:

> Hi there,
>
> Today I realized that we currently have a lot of not housekept flink
> distribution jar files and would like to know what to do about this, i.e.
> how to proper housekeep them.
>
> In the job submitting HDFS home directory, I find a subdirectory called
> `.flink` with hundreds of subfolders like `application_1573731655031_0420`,
> having the following structure:
>
> -rw-r--r--   3 dev dev        861 2020-01-27 21:17
> /user/dev/.flink/application_1580155950981_0010/4797ff6e-853b-460c-81b3-34078814c5c9-taskmanager-conf.yaml
> -rw-r--r--   3 dev dev        691 2020-01-27 21:16
> /user/dev/.flink/application_1580155950981_0010/application_1580155950981_0010-flink-conf.yaml2755466919863419496.tmp
> -rw-r--r--   3 dev dev        861 2020-01-27 21:17
> /user/dev/.flink/application_1580155950981_0010/fdb5ef57-c140-4f6d-9791-c226eb1438ce-taskmanager-conf.yaml
> -rw-r--r--   3 dev dev     92.2 M 2020-01-27 21:16
> /user/dev/.flink/application_1580155950981_0010/flink-dist_2.11-1.9.1.jar
> drwxr-xr-x   - dev dev          0 2020-01-27 21:16
> /user/dev/.flink/application_1580155950981_0010/lib
> -rw-r--r--   3 dev dev      2.6 K 2020-01-27 21:16
> /user/dev/.flink/application_1580155950981_0010/log4j.properties
> -rw-r--r--   3 dev dev      2.3 K 2020-01-27 21:16
> /user/dev/.flink/application_1580155950981_0010/logback.xml
> drwxr-xr-x   - dev dev          0 2020-01-27 21:16
> /user/dev/.flink/application_1580155950981_0010/plugins
>
> With having tons of those folders (For each flink session we
> launched/killed in our CI CD pipeline), they sum up to some terrabytes in
> our HDFS in used space.
> I suppose, I kill our flink sessions wrongly. We start and stop sessions
> and and jobs separately like so:
>
> Start:
>
> ${OS_ROOT}/flink/bin/yarn-session.sh -jm 4g -tm 32g --name 
> "${FLINK_SESSION_NAME}" -d -Denv.java.opts="-XX:+HeapDumpOnOutOfMemoryError"
>
> ${OS_ROOT}/flink/bin/flink run -m ${FLINK_HOST} [..savepoint/checkpoint 
> options...] -d -n "${JOB_JAR}" $*
>
> Stop
>
> ${OS_ROOT}/flink/bin/flink stop -p ${SAVEPOINT_BASEDIR}/${FLINK_JOB_NAME} -m 
> ${FLINK_HOST} ${ID}
>
> yarn application -kill "${ID}"
>
>
> yarn application -kill was the best I could find as the flink docu states,
> the linux session process should only be closed (" Stop the YARN session by
> stopping the unix process (using CTRL+C) or by entering ‘stop’ into the
> client.").
>
> Now my question: Is there a more elegant way to kill a yarn session
> (remotely from some host in the cluster, not necessarily the one starting
> the detached session), which also does the housekeeping then? Or should I
> do the housekeeping myself manually? (Pretty easy to script). Do I need to
> expect any more side effects when killing the session with "yarn
> application -kill"?
>
> Best regards
> Theo
>
> --
> SCOOP Software GmbH - Gut Maarhausen - Eiler Straße 3 P - D-51107 Köln
> Theo Diefenthal
>
> T +49 221 801916-196 - F +49 221 801916-17 - M +49 160 90506575
> theo.diefent...@scoop-software.de - www.scoop-software.de
> Sitz der Gesellschaft: Köln, Handelsregister: Köln,
> Handelsregisternummer: HRB 36625
> Geschäftsführung: Dr. Oleg Balovnev, Frank Heinen,
> Martin Müller-Rohde, Dr. Wolfgang Reddig, Roland Scheel
>

Reply via email to