Alexander Fedulov created FLINK-18398:
-----------------------------------------

             Summary: ElasticSearch unavailibility causes TM shutdown
                 Key: FLINK-18398
                 URL: https://issues.apache.org/jira/browse/FLINK-18398
             Project: Flink
          Issue Type: Bug
          Components: Connectors / ElasticSearch
    Affects Versions: 1.10.0
            Reporter: Alexander Fedulov
         Attachments: elastic_jm_log.txt, elastic_tm_log.txt

Similarly to [FLINK-17327|https://issues.apache.org/jira/browse/FLINK-17327], 
unavailibility of ElasticSearch cluster causes Tasks cancellation to timeout 
and Task Manager to be killed. The following exceptions can be found in the 
logs:

 
{code:java}
2020-06-15 19:52:03.664Z ERROR [  I/O dispatcher 229] 
.f.s.c.e.ElasticsearchSinkBase : Failed Elasticsearch bulk request: request 
retries exceeded max retry timeout [30000]java.io.IOException: request retries 
exceeded max retry timeout [30000]
...
2020-06-15 19:55:03.861Z  WARN [43df85ee0f907ae9d0).] o.a.f.r.taskmanager.Task  
     : Task 'graph53 (1/1)' did not react to cancelling signal for 30 seconds, 
but is stuck in method:
 org.elasticsearch.action.bulk.BulkProcessor.flush(BulkProcessor.java:356)
...
2020-06-15 19:55:04.120Z ERROR [663038f87ef09c4da6).] o.a.f.r.taskmanager.Task  
     : Task did not exit gracefully within 180 + seconds.
2020-06-15 19:55:04.121Z ERROR [663038f87ef09c4da6).] o.a.f.r.t.TaskExecutor    
     : Task did not exit gracefully within 180 + seconds.
2020-06-15 19:55:04.121Z ERROR [663038f87ef09c4da6).] 
o.a.f.r.t.TaskManagerRunner    : Fatal error occurred while executing the 
TaskManager. Shutting it down...
{code}

Detailed logs  are attached.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to