[ https://issues.apache.org/jira/browse/FLINK-28925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xintong Song closed FLINK-28925. -------------------------------- Resolution: Fixed master (1.16): 7ed817f2054a13c3e2754c37f7681d8fbdba4b41 > Fix the concurrency problem in hybrid shuffle > --------------------------------------------- > > Key: FLINK-28925 > URL: https://issues.apache.org/jira/browse/FLINK-28925 > Project: Flink > Issue Type: Bug > Components: Runtime / Network > Affects Versions: 1.16.0 > Reporter: Weijie Guo > Assignee: Weijie Guo > Priority: Blocker > Labels: pull-request-available > Fix For: 1.16.0 > > > Through tpc-ds testing and code analysis, I found some thread unsafe problems > in hybrid shuffle: > # HsSubpartitionMemeoryDataManager#consumeBuffer should return a > readOnlySlice buffer to downstream instead of original buffer: If the > spilling thread is processing while downstream task is consuming the same > buffer, the amount of data written to the disk will be smaller than the > actual value. To solve this, we should let the consuming thread and the > spilling thread share the same data but not index. > # HsSubpartitionMemoryDataManager#releaseSubpartitionBuffers should ignore > the release decision if the buffer already removed from bufferIndexToContexts > instead of throw an exception. It should be pointed out that although the > actual release operation is synchronous, a double release can still happen. > The reason is that non-global decisions do not need to be synchronized. In > other words, the main task thread and the consumer thread may decide to > release a buffer at the same time. -- This message was sent by Atlassian Jira (v8.20.10#820010)