Hi Allen, what volumes do you use for your TM pod? If you want your data to be deleted when the pod restarts, you can use an ephemeral volume like EmptyDir. And Flink should remove temporary files automatically when they are not needed anymore(see this discussion <https://lists.apache.org/thread/hlmmt2mn5d7q2nhctz59dqnlsoynyvmr>).
Working directory only takes effects after Flink 1.15, a local RocksDB directory is usually located under /tmp directory in Flink 1.14, if you don't specifically configure state.backend.rocksdb.localdir <https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#state-backend-rocksdb-localdir>. So, the working directory can't help. Allen Wang <allenxw...@gmail.com> 于2022年6月28日周二 04:39写道: > Hi Folks, > > We created a stateful job using SessionWindow and RocksDB state backend > and deployed it on Kubernetes Statefulset with persisted volumes. The Flink > version we used is 1.14. > > After the job runs for some time, we observed that the size of the local > RocksDB directory started to grow and there are more and more > directories created inside it. It seems that when the job is restarted or > the task manager K8s pod is restarted, the previous RocksDB directory > corresponding to the assigned operator is not cleaned up. Here is an > example: > > drwxr-xr-x 3 root root 4096 Jun 27 18:23 > job_00000000000000000000000000000000_op_WindowOperator_2b0a50a068bb7f1c8a470e4f763cbf26__1_4__uuid_c97f3f3f-649a-467d-82af-2bc250ec6e22 > drwxr-xr-x 3 root root 4096 Jun 27 18:45 > job_00000000000000000000000000000000_op_WindowOperator_2b0a50a068bb7f1c8a470e4f763cbf26__1_4__uuid_e4fca2c3-74c7-4aa2-9ca1-dda866b8de11 > drwxr-xr-x 3 root root 4096 Jun 27 18:56 > job_00000000000000000000000000000000_op_WindowOperator_2b0a50a068bb7f1c8a470e4f763cbf26__2_4__uuid_f1f7777a-7402-494d-80d7-65861394710c > drwxr-xr-x 3 root root 4096 Jun 27 17:34 > job_00000000000000000000000000000000_op_WindowOperator_f6dc7f4d2283f4605b127b9364e21148__3_4__uuid_08a14423-bea1-44ce-96ee-360a516d72a6 > > Although only > job_00000000000000000000000000000000_op_WindowOperator_2b0a50a068bb7f1c8a470e4f763cbf26__2_4__uuid_f1f7777a-7402-494d-80d7-65861394710c > is the active running operator, the other directories for the past > operators still exist. > > We set up the task manager property taskmanager.resource-id to be the task > manager pod name under the statefulset but it did not seem to help cleaning > up previous directories. > > Any pointers to solve this issue? > > We checked the latest document and it seems that Flink 1.15 introduced the > concept of local working directory: > https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/resource-providers/standalone/working_directory/. > Does that help cleaning up the RocksDB directory? > > Thanks, > Allen > > > > >