Weihua Hu created FLINK-32045:
---------------------------------
Summary: optimize task deployment performance for large-scale jobs
Key: FLINK-32045
URL: https://issues.apache.org/jira/browse/FLINK-32045
Project: Flink
Issue Type: Improvement
Components: Runtime / Coordination
Reporter: Weihua Hu
h1. Background
In FLINK-21110, we cache shuffle descriptors on the job manager side and
support using blob servers to offload these descriptors in order to reduce the
cost of tasks deployment.
I think there is also some improvement we could do for large-scale jobs.
# The default min size to enable distribution via blob server is 1MB. But for
a large wordcount job with 20000 parallelism, the size of serialized shuffle
descriptors is only 300KB. It means users need to lower the
"blob.offload.minsize", but the value is hard for users to decide.
# The task executor side still needs to load blob files and deserialize
shuffle descriptors for each task. Since these operations are running in the
main thread, it may be pending other RPCs from the job manager.
h1. Propose
# Enable distribute shuffle descriptors via blob server automatically. This
could be decided by the edge number of the current shuffle descriptor. The blob
offload will be enabled when the edge number exceeds an internal threshold.
# Introduce cache of deserialized shuffle descriptors on the task executor
side. This could reduce the cost of reading from local blob files and
deserialization. Of course, the cache should have TTL to avoid occupying too
much memory. And the cache should have the same switch mechanism as the blob
server offload.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)