Github user NicoK commented on a diff in the pull request: https://github.com/apache/flink/pull/4559#discussion_r157706995 --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/PipelinedSubpartition.java --- @@ -52,6 +54,10 @@ /** Flag indicating whether the subpartition has been released. */ private volatile boolean isReleased; + /** The number of non-event buffers currently in this subpartition */ + @GuardedBy("buffers") + private volatile int buffersInBacklog; --- End diff -- Your absolutely right about not counting events . Therefore, we cannot use the queue's size as I suggested. Yes, `BufferAndAvailability` would need to be extended as well. This integration/split of the spillable/spilled subpartitions and subpartition views and both of them working on the same structures requiring the same synchronisation pattern is imho really not nice and highly fragile. @pnowojski and me are currently re-designing the synchronisation in these parts of the code and are a bit sensitive to it now so let's drag him into this discussion as well: I would consider `PipelinedSubpartition` the hot path where we need to optimise most - spillable subpartitions are used in batch mode and have higher tolerances, especially when spilling to disk. if you returned the new backlog counter in `SpillableSubpartition#decreaseBuffersInBacklog()` however (retrieved under the `synchronized (buffers)` section), then you would not need the `volatile` either since you are already under the lock.
---