[ 
https://issues.apache.org/jira/browse/FLINK-28925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song closed FLINK-28925.
--------------------------------
    Resolution: Fixed

master (1.16): 7ed817f2054a13c3e2754c37f7681d8fbdba4b41

> Fix the concurrency problem in hybrid shuffle
> ---------------------------------------------
>
>                 Key: FLINK-28925
>                 URL: https://issues.apache.org/jira/browse/FLINK-28925
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Network
>    Affects Versions: 1.16.0
>            Reporter: Weijie Guo
>            Assignee: Weijie Guo
>            Priority: Blocker
>              Labels: pull-request-available
>             Fix For: 1.16.0
>
>
> Through tpc-ds testing and code analysis, I found some thread unsafe problems 
> in hybrid shuffle:
>  # HsSubpartitionMemeoryDataManager#consumeBuffer should return a 
> readOnlySlice buffer to downstream instead of original buffer: If the 
> spilling thread is processing while  downstream task is consuming the same 
> buffer, the amount of data written to the disk will be smaller than the 
> actual value. To solve this, we should let the consuming thread and the 
> spilling thread share the same data but not index.
>  # HsSubpartitionMemoryDataManager#releaseSubpartitionBuffers should ignore 
> the release decision if the buffer already removed from bufferIndexToContexts 
> instead of throw an exception. It should be pointed out that although the 
> actual release operation is synchronous, a double release can still happen. 
> The reason is that non-global decisions do not need to be synchronized. In 
> other words, the main task thread and the consumer thread may decide to 
> release a buffer at the same time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to