[ https://issues.apache.org/jira/browse/FLINK-11457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Oscar Westra van Holthe - Kind updated FLINK-11457: --------------------------------------------------- Description: When cancelling a job running on a yarn based cluster and then shutting down the cluster, metrics on the push gateway are not deleted. My yarn-conf.yaml settings: {code:yaml} metrics.reporters: promgateway metrics.reporter.promgateway.class: org.apache.flink.metrics.prometheus.PrometheusPushGatewayReporter metrics.reporter.promgateway.host: pushgateway.gcpstg.bolcom.net metrics.reporter.promgateway.port: 9091 metrics.reporter.promgateway.jobName: PSMF metrics.reporter.promgateway.randomJobNameSuffix: true metrics.reporter.promgateway.deleteOnShutdown: true metrics.reporter.promgateway.interval: 30 SECONDS {code} What I expect to happen: * when running, the metrics are pushed to the push gateway to a separate label per node (jobmanager/taskmanager) * when shutting down, the metrics are deleted from the push gateway This last bit does not happen. How the job is run: {code}flink run -m yarn-cluster -yn 5 -ys 2 -yst "$INSTALL_DIRECTORY/app/psmf.jar"{code} How the job is stopped: {code} YARN_APP_ID=$(yarn application -list | grep "PSMF" | awk '{print $1}') FLINK_JOB_ID=$(flink list -r -yid ${YARN_APP_ID} | grep "PSMF" | awk '{print $4}') flink cancel -s "${SAVEPOINT_DIR%/}/" -yid "${YARN_APP_ID}" "${FLINK_JOB_ID}" echo "stop" | yarn-session.sh -id ${YARN_APP_ID} {code} Is there anything I'm sdoing wrong? Anything I can help to fix? was: When cancelling a job running on a yarn based cluster and then shutting down the cluster, metrics on the push gateway are not deleted. Any thoughts on a solution? I'm happy to implement it, but Im not sure what the best solution would be. > PrometheusPushGatewayReporter does not cleanup its metrics > ---------------------------------------------------------- > > Key: FLINK-11457 > URL: https://issues.apache.org/jira/browse/FLINK-11457 > Project: Flink > Issue Type: Bug > Reporter: Oscar Westra van Holthe - Kind > Priority: Major > > When cancelling a job running on a yarn based cluster and then shutting down > the cluster, metrics on the push gateway are not deleted. > My yarn-conf.yaml settings: > {code:yaml} > metrics.reporters: promgateway > metrics.reporter.promgateway.class: > org.apache.flink.metrics.prometheus.PrometheusPushGatewayReporter > metrics.reporter.promgateway.host: pushgateway.gcpstg.bolcom.net > metrics.reporter.promgateway.port: 9091 > metrics.reporter.promgateway.jobName: PSMF > metrics.reporter.promgateway.randomJobNameSuffix: true > metrics.reporter.promgateway.deleteOnShutdown: true > metrics.reporter.promgateway.interval: 30 SECONDS > {code} > What I expect to happen: > * when running, the metrics are pushed to the push gateway to a separate > label per node (jobmanager/taskmanager) > * when shutting down, the metrics are deleted from the push gateway > This last bit does not happen. > How the job is run: > {code}flink run -m yarn-cluster -yn 5 -ys 2 -yst > "$INSTALL_DIRECTORY/app/psmf.jar"{code} > How the job is stopped: > {code} > YARN_APP_ID=$(yarn application -list | grep "PSMF" | awk '{print $1}') > FLINK_JOB_ID=$(flink list -r -yid ${YARN_APP_ID} | grep "PSMF" | awk '{print > $4}') > flink cancel -s "${SAVEPOINT_DIR%/}/" -yid "${YARN_APP_ID}" "${FLINK_JOB_ID}" > echo "stop" | yarn-session.sh -id ${YARN_APP_ID} > {code} > Is there anything I'm sdoing wrong? Anything I can help to fix? -- This message was sent by Atlassian JIRA (v7.6.3#76005)