[ https://issues.apache.org/jira/browse/FLINK-21201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Piotr Nowojski updated FLINK-21201: ----------------------------------- Issue Type: Improvement (was: Bug) > Creating BoundedBlockingSubpartition blocks TaskManager’s main thread > --------------------------------------------------------------------- > > Key: FLINK-21201 > URL: https://issues.apache.org/jira/browse/FLINK-21201 > Project: Flink > Issue Type: Improvement > Components: Runtime / Network > Affects Versions: 1.12.1 > Reporter: Zhilong Hong > Priority: Major > Attachments: jobmanager.log.tar.gz, taskmanager.log.tar.gz > > > When we are trying to run batch jobs with 8k parallelism, it takes a long > time to deploy the vertices. After the investigation, we find that creating > BoundedBlockingSubpartition blocks TaskManager’s main thread during the > procedure of {{submitTask}}. > When JobMaster invokes {{submitTask}} and sends an RPC call to the > TaskManager, the TaskManager will receive the RPC call and execute the > {{submitTask}} method in its main thread. In the {{submitTask}} method, the > TaskExecutor will create a Task instance and try to start it. During the > creation, the TaskExecutor will create the ResultPartition and its > ResultSubpartitions. > For the batch job, the type of ResultSubpartitions is the > BoundedBlockingSubpartition with the FileChannelBoundedData. The > BoundedBlockingSubpartition will create a file on the local disk, which is an > IO operation and could take a long time. > In our test, it would take at most 30+ seconds to create 8k > BoundedBlockingSubpartitions. This procedure blocks the main thread of the > TaskManager, and would lead to heartbeat timeout and slow task deploying. In > my opinion, the IO operation should be executed with IOExecutor rather than > the main thread. > I add several log items to show what TaskExecutor is doing during > {{submitTask}}. > {code:java} > 2021-01-29 14:44:37,557 INFO > org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Start to > submit task #898 (c9aefd1d30c2b133ba04ad495cd894fd) > 2021-01-29 14:44:37,557 INFO > org.apache.flink.runtime.taskexecutor.slot.TaskSlotTableImpl [] - Activate > slot 38cafd5b456cc8ff873bbe18e4bf708a. > 2021-01-29 14:44:37,932 INFO > org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Start to > init Task #898 (c9aefd1d30c2b133ba04ad495cd894fd) instance. > 2021-01-29 14:44:37,932 INFO > org.apache.flink.runtime.io.network.NettyShuffleEnvironment [] - Start to > create 1 result partition(s). > 2021-01-29 14:44:37,932 INFO > org.apache.flink.runtime.io.network.partition.ResultPartitionFactory [] - > Initializing BoundedBlockingResultPartitions > 2021-01-29 14:44:37,932 INFO > org.apache.flink.runtime.io.network.partition.ResultPartitionFactory [] - > Start to create 8000 FILE BoundedBlockingSubpartitions. > 2021-01-29 14:44:37,932 INFO > org.apache.flink.runtime.io.network.partition.BoundedBlockingSubpartition [] > - FileChannel #0 created. > ... ... > 2021-01-29 14:45:06,052 INFO > org.apache.flink.runtime.io.network.partition.BoundedBlockingSubpartition [] > - FileChannel #7999 created. > 2021-01-29 14:45:06,052 INFO > org.apache.flink.runtime.io.network.partition.ResultPartitionFactory [] - > Finish creating 8000 FILE BoundedBlockingSubpartitions. > 2021-01-29 14:45:06,052 INFO > org.apache.flink.runtime.io.network.NettyShuffleEnvironment [] - Finish > creating 1 result partition(s). > 2021-01-29 14:45:06,052 INFO > org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Finish > initializing task #898 (c9aefd1d30c2b133ba04ad495cd894fd) instance. > 2021-01-29 14:45:06,052 INFO > org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Received > task Source: Source 0 (898/8000)#0 (c9aefd1d30c2b133ba04ad495cd894fd), deploy > into slot with allocation id 38cafd5b456cc8ff873bbe18e4bf708a. > 2021-01-29 14:45:06,053 INFO org.apache.flink.runtime.taskmanager.Task > [] - Source: Source 0 (898/8000)#0 > (c9aefd1d30c2b133ba04ad495cd894fd) switched from CREATED to DEPLOYING. > {code} > We can see that it takes nearly 29 seconds to create 8k > BoundedBlockingSubpartitions, and this would blocks the main thread in the > TaskExecutor. > The log of JobManager and TaskManager is attached below. The most typical > task is Source 0: #898. -- This message was sent by Atlassian Jira (v8.3.4#803005)