GitHub user StephanEwen opened a pull request:

    https://github.com/apache/flink/pull/938

    [FLINK-2406] [FLINK-2402] Abstract the BarrierBuffers and add a "at least 
once" version (BarrierTracker)

    Currently, the checkpointing mechanism provides "exactly once" guarantees. 
Part of that is the step that temporarily "aligns" the data streams, performed 
by the `BarrierBuffer`. This step increases the tuple latency temporarily.
    
    By offering a version that does not provide "exactly-once", but only 
"at-least-once", we can avoid the latency increase. This is relevant for 
super-low-latency applications (< 10 ms) that tolerate duplicates.
    
    This pull request does:
    
      - Add an interface for the functionality of the barrier buffer, to allow 
adding different implementations
      - Add the `BarrierTracker` which only tracks checkpoint barriers, without 
aligning the streams.
      - Add broader tests for the BarrierBuffer, including trailing data and 
barrier races.
      - Checkpoint barriers are handled by the buffer directly, rather than 
being returned and re-injected.
      - Simplify logic in the BarrierBuffer and fix certain corner cases.
      - Give access to spill directories properly via I/O manager, rather than 
via GlobalConfiguration singleton.
      - Rename the "BarrierBufferIOTest" to "BarrierBufferMassiveRandomTest"
      - A lot of code style/robustness fixes (proplery define constants, 
visibility, exception signatures)
    
    @gyfora  and @uce : The reworked variant of the BarrierBuffer eventually 
returns `null` when all buffers and events have been consumed. The previous 
impementation repeatedly returned the `EndOfPartitionEvent`. That seemed 
inconsistent with the implementation of the "vanilla" InputGate, which 
eventually returns `null` when all partitions have finished.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/StephanEwen/incubator-flink latency

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/938.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #938
    
----
commit 9d9e34e85c812eedb7b0f6365952c853fc531627
Author: Stephan Ewen <se...@apache.org>
Date:   2015-07-12T17:33:38Z

    [tests] Add a manual test to evaluate impact of checkpointing on latency

commit 4cafad08a1b64bcaf0c95fe0b211eb740f6774b2
Author: Stephan Ewen <se...@apache.org>
Date:   2015-07-26T16:58:37Z

    [FLINK-2406] [streaming] Abstract and improve stream alignment via the 
BarrierBuffer
    
     - Add an interface for the functionaliy of the barrier buffer (for later 
addition of other implementatiions)
     - Add broader tests for the BarrierBuffer, inluding trailing data and 
barrier races.
     - Checkpoint barriers are handled by the buffer directly, rather than 
being returned and re-injected.
     - Simplify logic in the BarrierBuffer and fix certain corner cases.
     - Give access to spill directories properly via I/O manager, rather than 
via GlobalConfiguration singleton.
     - Rename the "BarrierBufferIOTest" to "BarrierBufferMassiveRandomTest"
     - A lot of code style/robustness fixes (proplery define constants, 
visibility, exception signatures)

commit ba9e5a3cc1aacb5078fdc50fee12f60ccefd2563
Author: Stephan Ewen <se...@apache.org>
Date:   2015-07-26T17:05:30Z

    [FLINK-2402] [streaming] Add a stream checkpoint barrier tracker.
    
    The BarrierTracker is non-blocking and only counts barriers.
    That way, it does not increase latency of records in the stream, but can 
only be
    used to obtain "at least once" processing guarantees.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to