xintongsong edited a comment on pull request #13464: URL: https://github.com/apache/flink/pull/13464#issuecomment-698301073
I can see how the mapping simplifies things. My concern is whether this simplification hurts not only optimal but also the correctness. Not entirely sure about this. I'll try to explain my concern with an example. * 2 requirement profiles: A & B * 2 slot profiles: X & Y * A can only be fulfilled by X * B can be fulfilled by X or Y * Resource-requirement mapping status * A: 1 -> X: 1 * B: 2 -> X: 1, Y: 1 * excees -> Y: 1 Now a slot of profile X is lost. Since neither A nor B have too many resources, either of them might be deducted. If A is deducted, the excess Y cannot be used, and we would need to request for a new resource for A. * A: 1 -> X: 0 * B: 2 -> X: 1, Y: 1 * excees -> Y: 1 If B is deducted, then the excess Y can be used, and we do not need to allocate new resources. * A: 1 -> X: 1 * B: 2 -> Y: 2 * excees -> none Assuming all tasks are in running state. If a slot assigned to requirement A is lost and RM deducts B, then RM will not assign new slot to the job, and JM cannot deploy tasks from the lost slot to the excess slot Y. Either the tasks cannot recover, or JM will have to stop some tasks from a slot X and move them to the excess Y. On the other hand, if a slot assigned to requirement B is lost and RM deducts A, then JM will have no problem recovering the failed tasks in slot Y, but RM still allocates and assign a new slot to the job. Even if the job returns the unneeded slot, RM may keep trying to allocate new slot for the job, because it sees that the acquired resources for this job does not match the required resources. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org