[ 
https://issues.apache.org/jira/browse/FLINK-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16377099#comment-16377099
 ] 

ASF GitHub Bot commented on FLINK-8755:
---------------------------------------

GitHub user NicoK opened a pull request:

    https://github.com/apache/flink/pull/5581

    [FLINK-8755][FLINK-8786][network] fix two bugs in spilled and spillable 
subpartition views

    ## What is the purpose of the change
    
    1) `SpilledSubpartitionView#getNextBuffer()` relies on the backlog to 
signal further data availability. However, if there are only events left in the 
buffer queue, their buffers are not included in the backlog counting and 
therefore, `isMoreAvailable` will be wrongly false here.
    2) When processing the last in-memory buffer in 
`SpillableSubpartitionView#getNextBuffer`, we always set the `isMoreAvailable` 
flag of the returned `BufferAndBacklog` to false irrespective of what may be in 
the spill writer.
    
    This PR fixes both issues and heavily extends the unit tests in this 
regard, hence the two were combined in a single PR. Please also note that this 
PR is built upon #5549, #5550, #5551, and #5572 to reduce possible merge 
conflicts - everything starting after FLINK-8694 is new.
    
    ## Brief change log
    
    - rename `RecordWriter#closeBufferConsumer()` to `closeBufferBuilder()` 
(internal method, we switched to buffer builders a while ago)
    - make `AwaitableBufferAvailablityListener` (used by tests only) thread-safe
    - fix `SpilledSubpartitionView#getNextBuffer()` to not only rely on the 
backlog
    - fix `SpillableSubpartitionView#getNextBuffer()` returning wrong 
`isMoreAvailable` when processing the last in-memory buffer
    - extended overall subpartition tests to also verify several other flags 
that were added in the past but not covered appropriately, e.g. 
`BufferAndBacklog#isMoreAvailable()` or `ResultSubpartitionView#isAvailable()`
    - some minor code and documentation improvements
    
    (more details in the individual commits)
    
    ## Verifying this change
    
    This change added tests and can be verified as follows:
    - added several checks to `PipelinedSubpartitionTest` and 
`SpillableSubpartitionTest` via helper methods in `SubpartitionTestBase`
    
    ## Does this pull request potentially affect one of the following parts:
    
      - Dependencies (does it add or upgrade a dependency): **no**
      - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: **no**
      - The serializers: **no**
      - The runtime per-record code paths (performance sensitive): **no**
      - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: **no**
      - The S3 file system connector: **no**
    
    ## Documentation
    
      - Does this pull request introduce a new feature? (yes / no)
      - If yes, how is the feature documented? (not applicable / docs / 
JavaDocs / not documented)


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/NicoK/flink flink-8786

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/5581.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #5581
    
----
commit 40e18c85563e1ef45ce89709fa3aa7613439e12d
Author: Nico Kruber <nico@...>
Date:   2018-02-20T17:04:12Z

    [FLINK-8733][network] fix 
SpillableSubpartition#spillFinishedBufferConsumers() not counting spilled bytes

commit 2721cabce0dd2be4bb4da4097ff4e6c7749498c1
Author: Nico Kruber <nico@...>
Date:   2018-02-20T17:05:54Z

    [FLINK-8734][network] fix partition bytes counting and re-enable in tests

commit 375de6118a5d84d21b40b7c23438d09204ad664b
Author: Nico Kruber <nico@...>
Date:   2018-02-20T17:06:41Z

    [hotfix][network] remove PowerMockRunner from RecordWriterTest

commit d0d3c7b026d5af3a57c47892501ab0e74e7172b2
Author: Nico Kruber <nico@...>
Date:   2018-02-20T17:07:02Z

    [hotfix][network] various minor improvements

commit dbcfe73c41618c70e16884f2c723fc9a6a9dca4f
Author: Nico Kruber <nico@...>
Date:   2018-02-21T15:30:53Z

    [hotfix][network] initialize SingleInputGate#enqueuedInputChannelsWithData 
with the right size

commit 13f2e09240b9efb8163bb93dad52486fc2af65ac
Author: Nico Kruber <nico@...>
Date:   2018-02-21T16:09:31Z

    [FLINK-8736][network] fix memory segment offsets for slices of slices being 
wrong

commit f8363154ef2e03b99a471f627028ad50fc1271ab
Author: Nico Kruber <nico@...>
Date:   2018-02-23T17:07:31Z

    fixup! [hotfix][network] various minor improvements

commit 8cf861f06ddc9c79fc61407ebe426213d1740ef7
Author: Piotr Nowojski <piotr.nowojski@...>
Date:   2018-02-23T10:20:21Z

    [FLINK-8760][runtime] Correctly propagate moreAvailable flag through 
SingleInputGate
    
    Previously if we SingleInputGate was re-eqnqueuing an input channel, 
isMoreAvailable
    might incorrectly return false. This might caused some dead locks.

commit c7cda5463e7bba1d2f3f62006f6e4a71246efccb
Author: Piotr Nowojski <piotr.nowojski@...>
Date:   2018-02-23T10:37:37Z

    [hotfix][tests] Deduplicate code in SingleInputGateTest

commit 954c019a70cf881de7410762977e033e2768c11e
Author: Piotr Nowojski <piotr.nowojski@...>
Date:   2018-02-23T11:11:14Z

    [hotfix][runtime] Remove duplicated check

commit 55832a9ac1355e48fc9665967074ebc262da2f65
Author: Piotr Nowojski <piotr.nowojski@...>
Date:   2018-02-23T10:27:54Z

    [hotfixu][tests] Do not hide original exception in 
SuccessAfterNetworkBuffersFailureITCase

commit b6d98e99dd5cbf7cc0554cd83b81f3d2621e0057
Author: Piotr Nowojski <piotr.nowojski@...>
Date:   2018-02-23T10:28:20Z

    [FLINK-8694][runtime] Fix notifyDataAvailable race condition
    
    Before there was a race condition that might resulted in igonoring some 
notifyDataAvailable calls.
    This fixes the problem by moving buffersAvailable handling to Supartitions 
and adds stress test
    for flushAlways (without this fix this test is dead locking).

commit d65ffe94414e8bc4a6457955c8ca89bf68f537cb
Author: Nico Kruber <nico@...>
Date:   2018-02-26T14:07:49Z

    fixup! [FLINK-8694][runtime] Fix notifyDataAvailable race condition

commit 26508abe76d7436c13b8415cd0411560cb31f4d4
Author: Nico Kruber <nico@...>
Date:   2018-02-26T15:39:01Z

    fixup! [FLINK-8694][runtime] Fix notifyDataAvailable race condition

commit 00e9b252d218932dda3daf28f95cc8fd2e34cac8
Author: Nico Kruber <nico@...>
Date:   2018-02-22T13:11:13Z

    [hotfix][network] rename RecordWriter#closeBufferConsumer() to 
closeBufferBuilder()

commit 5ecd6f7215e164e6971cd305f16974639fccc7a3
Author: Nico Kruber <nico@...>
Date:   2018-02-22T13:17:06Z

    [hotfix][network] various minor improvements

commit e3cff13a7785fe61236a735bb37a0fd4345fe13b
Author: Nico Kruber <nico@...>
Date:   2018-02-23T09:35:41Z

    [hotfix][network][tests] make AwaitableBufferAvailablityListener thread-safe
    
    This is called asynchronously by the spill writer and thus may need
    synchronization on incrementing the counter but definately had visibility
    issues with the counter. Using an AtomicLong fixes that.

commit 8a8470a6225bbdf8e0c962ba0d9bbbd75354d309
Author: Nico Kruber <nico@...>
Date:   2018-02-23T09:19:58Z

    [FLINK-8755][network] fix SpilledSubpartitionView relying on the backlog 
for determining whether more data is available
    
    Fix SpilledSubpartitionView#getNextBuffer() to not only rely on the backlog:
    instead it is sufficient to also return true if the next buffer is an event
    since either there is a real buffer enqueued (reflected by the backlog) or 
at
    least one event.

commit 96dd3ffe5cbb6581c2bc80c86853c7249973c15e
Author: Nico Kruber <nico@...>
Date:   2018-02-23T11:13:20Z

    [FLINK-8755][FLINK-8786][network] add and improve subpartition tests
    
    + also improve the subpartition tests in general to reduce some duplication

commit 603b1562a6ad257376057a1a3fc507604d83ffcd
Author: Nico Kruber <nico@...>
Date:   2018-02-26T15:27:44Z

    [FLINK-8786][network] fix SpillableSubpartitionView#getNextBuffer returning 
wrong isMoreAvailable when processing last in-memory buffer
    
    When processing the last in-memory buffer in
    SpillableSubpartitionView#getNextBuffer while the rest of the buffers are
    spilled, need to rely on the spilled view's isAvailable instead of always
    setting the isMoreAvailable flag of the returned BufferAndBacklog to false.

----


> SpilledSubpartitionView wrongly relys on the backlog for determining whether 
> more data is available
> ---------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-8755
>                 URL: https://issues.apache.org/jira/browse/FLINK-8755
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Network
>            Reporter: Nico Kruber
>            Assignee: Nico Kruber
>            Priority: Blocker
>             Fix For: 1.5.0
>
>
> {code}
> public BufferAndBacklog getNextBuffer() throws IOException, 
> InterruptedException {
> //...
>         int newBacklog = parent.decreaseBuffersInBacklog(current);
>         return new BufferAndBacklog(current, newBacklog > 0, newBacklog, 
> nextBufferIsEvent);
> {code}
> relies on the backlog to signal further data availability. However, if there 
> are only events left in the buffer queue, their buffers are not included in 
> the backlog counting and therefore, {{isMoreAvailable}} will be wrongly 
> {{false}} here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to