[ https://issues.apache.org/jira/browse/FLINK-15843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17150879#comment-17150879 ]
Chesnay Schepler commented on FLINK-15843: ------------------------------------------ [~fly_in_gis] ping > Gracefully shutdown TaskManagers on Kubernetes > ---------------------------------------------- > > Key: FLINK-15843 > URL: https://issues.apache.org/jira/browse/FLINK-15843 > Project: Flink > Issue Type: Improvement > Components: Deployment / Kubernetes > Affects Versions: 1.10.0 > Reporter: Canbin Zheng > Priority: Major > Fix For: 1.11.0 > > > The current solution of stopping a TaskManager instance when JobManager sends > a deletion request is by directly calling > {{KubernetesClient.pods().withName().delete}}, thus that instance would be > violently killed with a _KILL_ signal and having no chance to clean up, which > could cause problems because we expect the process to gracefully terminate > when it is no longer needed. > Refer to the guide of [Termination of Pods|#termination-of-pods], we know > that on Kubernetes a _TERM_ signal would be first sent to the main process in > each container, and may be followed up with a force _KILL_ signal if the > graceful shut-down period has expired; the Unix signal will be sent to the > process which has PID 1 ([Docker > Kill|https://docs.docker.com/engine/reference/commandline/kill/]), however, > the TaskManagerRunner process is spawned by > {color:#172b4d}/opt/flink/bin/kubernetes-entry.sh {color}and could never have > PID 1, so it would never receive the TERM signal. > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)