Hi,

For the situation where a program specify a maximum parallelism (so it is 
supposed to use all available task slots) we can have the possibility that one 
of the task managers is not registered for various reasons.
In this case the job will fail for not enough free slots to run the job.

For me this means the scheduler has a limitation to work by statically assign 
tasks to the task slots the job is configured.

Instead I would like to be able to specify a minimum parallelism of a job but 
also the possibility to dynamically use more task slots if additional task 
slots can be used.
Another use case will be that if during the execution of a job we lose one node 
so some task slots, if the minimum parallelism is still ensured, the job should 
recover and continue its execution instead of just failing.

Is it possible to make such changes?

Best,
Ovidiu

Reply via email to