Weijie Guo created FLINK-28925:
----------------------------------
Summary: Fix the concurrency problem in hybrid shuffle
Key: FLINK-28925
URL: https://issues.apache.org/jira/browse/FLINK-28925
Project: Flink
Issue Type: Bug
Components: Runtime / Network
Affects Versions: 1.16.0
Reporter: Weijie Guo
Fix For: 1.16.0
Through tpc-ds testing and code analysis, I found some thread unsafe problems
in hybrid shuffle:
# HsSubpartitionMemeoryDataManager#consumeBuffer should return a readOnlySlice
buffer to downstream instead of original buffer: If the spilling thread is
processing while downstream task is consuming the same buffer, the amount of
data written to the disk will be smaller than the actual value. To solve this,
we should let the consuming thread and the spilling thread share the same data
but not index.
# HsSubpartitionMemoryDataManager#releaseSubpartitionBuffers should ignore the
release decision if the buffer already removed from bufferIndexToContexts
instead of throw an exception.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)