Fan Xinpu created FLINK-11149:
---------------------------------
Summary: Flink will request too more containers than it actually
needs
Key: FLINK-11149
URL: https://issues.apache.org/jira/browse/FLINK-11149
Project: Flink
Issue Type: Improvement
Components: YARN
Affects Versions: 1.7.0
Reporter: Fan Xinpu
As known, flink will request new containers when it was notified that some
allocated container is completed. Let me say, maybe one container failed, and
Flink tries to request one container from NM, but actually Flink will request
n+1 containers, the n refers to the number that ever requested after cluster is
created.It is not graceful.
When requesting a container, Flink will send a ContainerRequest to RM through
AMRM Client, and AMRMClient will save the ContainerRequest in itself, and hopes
the ContainerRequest will be removed in future, but Flink never removes the
ContainerRequest, so one by one, the number of ContainerRequest accumulates to
a unexpected value.
In our environment, a cluster initially allocated 100 containers, and later
on,it requests one container from RM, RM returns more than 2000 containers to
it as the request actually has more than 2000 ContainerRequest. Although Flink
will return the excess containers, this request behavior waste time and
resource on yarn.
So, maybe Flink can remove the ContainerRequest after the request has been
sent to RM, then Flink will get exactly numbers of containers as it explicitly
did.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)