[ https://issues.apache.org/jira/browse/FLINK-23466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Piotr Nowojski closed FLINK-23466. ---------------------------------- Resolution: Fixed Merged to master as 48a384dffc7 Merged to release-1.14 as 0067d35cc0f > UnalignedCheckpointITCase hangs on Azure > ---------------------------------------- > > Key: FLINK-23466 > URL: https://issues.apache.org/jira/browse/FLINK-23466 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing > Affects Versions: 1.14.0 > Reporter: Dawid Wysakowicz > Assignee: Yingjie Cao > Priority: Blocker > Labels: pull-request-available, test-stability > Fix For: 1.14.0 > > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=20813&view=logs&j=a57e0635-3fad-5b08-57c7-a4142d7d6fa9&t=2ef0effc-1da1-50e5-c2bd-aab434b1c5b7&l=16016 > The problem is the buffer listener will be removed from the listener queue > when notified and then it will be added to the listener queue again if it > needs more buffers. However, if some buffers are recycled meanwhile, the > buffer listener will not be notified of the available buffers. For example: > 1. Thread 1 calls LocalBufferPool#recycle(). > 2. Thread 1 reaches LocalBufferPool#fireBufferAvailableNotification() and > listener.notifyBufferAvailable() is invoked, but Thread 1 sleeps before > acquiring the lock to registeredListeners.add(listener). > 3. Thread 2 is being woken up as a result of notifyBufferAvailable() > call. It takes the buffer, but it needs more buffers. > 4. Other threads, return all buffers, including this one that has been > recycled. None are taken. Are all in the LocalBufferPool. > 5. Thread 1 wakes up, and continues fireBufferAvailableNotification() > invocation. > 6. Thread 1 re-adds listener that's waiting for more buffer > registeredListeners.add(listener). > 7. Thread 1 exits loop LocalBufferPool#recycle(MemorySegment, int) > inside, as the original memory segment has been used. > At the end we have a state where all buffers are in the LocalBufferPool, so > no new recycle() calls will happen, but there is still one listener waiting > for a buffer (despite buffers being available). -- This message was sent by Atlassian Jira (v8.3.4#803005)