Re: Fine-grained task recovery

Robert Metzger Wed, 16 Dec 2020 00:12:45 -0800

If a TaskManager fails, the data stored on it will be lost and needs to be
recomputed. So even with the batch mode configured, more tasks might need a
restart.
To mitigate that, the Flink developers need to implement support for
external shuffle services.


On Wed, Dec 16, 2020 at 9:10 AM Robert Metzger <rmetz...@apache.org> wrote:

> With region failover strategy, all connected subtasks will fail.
>
> If you are using the DataSet API with env.getConfig().setExecutionMode(
> ExecutionMode.BATCH);, you should get the desired behavior.
>
> On Mon, Dec 14, 2020 at 5:24 PM Stanislav Borissov <sk.boris...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I'm running a simple, "embarassingly parallel" ETL-type job. I noticed
>> that a failure in one subtask causes the entire job to restart. Even with
>> the region failover strategy, all subtasks of this task and connected ones
>> would fail. Is there any way to limit restarting to only the single subtask
>> that failed, so all other subtasks can stay alive and keep working?
>>
>> For context, I use Flink 1.11 in AWS Kinesis Data Analytics, so some
>> configuration is not controlled by me
>> <https://docs.aws.amazon.com/kinesisanalytics/latest/java/reference-flink-settings.title.html>
>> .
>>
>> Thanks
>>
>

Re: Fine-grained task recovery

Reply via email to