Alexander Fedulov created FLINK-18398: -----------------------------------------
Summary: ElasticSearch unavailibility causes TM shutdown Key: FLINK-18398 URL: https://issues.apache.org/jira/browse/FLINK-18398 Project: Flink Issue Type: Bug Components: Connectors / ElasticSearch Affects Versions: 1.10.0 Reporter: Alexander Fedulov Attachments: elastic_jm_log.txt, elastic_tm_log.txt Similarly to [FLINK-17327|https://issues.apache.org/jira/browse/FLINK-17327], unavailibility of ElasticSearch cluster causes Tasks cancellation to timeout and Task Manager to be killed. The following exceptions can be found in the logs: {code:java} 2020-06-15 19:52:03.664Z ERROR [ I/O dispatcher 229] .f.s.c.e.ElasticsearchSinkBase : Failed Elasticsearch bulk request: request retries exceeded max retry timeout [30000]java.io.IOException: request retries exceeded max retry timeout [30000] ... 2020-06-15 19:55:03.861Z WARN [43df85ee0f907ae9d0).] o.a.f.r.taskmanager.Task : Task 'graph53 (1/1)' did not react to cancelling signal for 30 seconds, but is stuck in method: org.elasticsearch.action.bulk.BulkProcessor.flush(BulkProcessor.java:356) ... 2020-06-15 19:55:04.120Z ERROR [663038f87ef09c4da6).] o.a.f.r.taskmanager.Task : Task did not exit gracefully within 180 + seconds. 2020-06-15 19:55:04.121Z ERROR [663038f87ef09c4da6).] o.a.f.r.t.TaskExecutor : Task did not exit gracefully within 180 + seconds. 2020-06-15 19:55:04.121Z ERROR [663038f87ef09c4da6).] o.a.f.r.t.TaskManagerRunner : Fatal error occurred while executing the TaskManager. Shutting it down... {code} Detailed logs are attached. -- This message was sent by Atlassian Jira (v8.3.4#803005)