[ https://issues.apache.org/jira/browse/FLINK-33879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Weijie Guo reassigned FLINK-33879: ---------------------------------- Assignee: Jiang Xin > Hybrid Shuffle may hang during redistribution > --------------------------------------------- > > Key: FLINK-33879 > URL: https://issues.apache.org/jira/browse/FLINK-33879 > Project: Flink > Issue Type: Bug > Components: Runtime / Network > Reporter: Jiang Xin > Assignee: Jiang Xin > Priority: Major > Labels: pull-request-available > Fix For: 1.19.0 > > > Currently, the Hybrid Shuffle can work with the memory tier and disk tier > together, however, in the following scenario the result partition would stop > working. > Suppose we have a shuffle task with 2 sub-partitions. The LocalBufferPool has > 15 buffers, the memory tier can use at most 15-(2*(2+1)+1) = 8 buffers > according to `TieredStorageMemoryManagerImpl#getMaxNonReclaimableBuffers`. If > the memory tier uses up all 8 buffers and the input channel doesn't consume > them because of some problem, the disk tier can still work with 1 reserved > buffer. However, if a redistribution happens now and the pool size is > decreased to less than 8, then the BufferAccumulator can not request buffers > anymore, and thus the result partition stops working as well. > The purpose is to make the result partition still work with the disk tier and > write the shuffle data to disk so that once the input channel is ready, the > data on the disk can be consumed immediately -- This message was sent by Atlassian Jira (v8.20.10#820010)