Hi, all
I have (maybe a clumsy) question about executor recovery num in
yarn-client mode. My situation is as follows:
We have a 1(resource manager) + 3(node manager) cluster, a app is
running with one driver on the resource manager and 12 executors on all
the node managers,
and there are 4 executors on each node manager machine. For some reason
4 executors on one machine disassociated/failed, then 2 executors recovered.
My question is why not 4 executors recovered? who and how decide the
number of recovered executors?
Thanks