Hi Peter&Till: As commented in the issue <https://issues.apache.org/jira/browse/FLINK-10868#>,We have introduced the FLINK-10868 <https://issues.apache.org/jira/browse/FLINK-10868> patch (mainly batch tasks) online, what do you think of the following two suggestions:
1) Parameter control time interval. At present, the default time interval of 1 min is used, which is too short for batch tasks; 2)Parameter Control When the failed Container number reaches MAXIMUM_WORKERS_FAILURE_RATE and JM disconnects whether to perform OnFatalError so that the batch tasks can exit as soon as possible. Best regards, Anyang