GitHub user StephanEwen opened a pull request: https://github.com/apache/flink/pull/938
[FLINK-2406] [FLINK-2402] Abstract the BarrierBuffers and add a "at least once" version (BarrierTracker) Currently, the checkpointing mechanism provides "exactly once" guarantees. Part of that is the step that temporarily "aligns" the data streams, performed by the `BarrierBuffer`. This step increases the tuple latency temporarily. By offering a version that does not provide "exactly-once", but only "at-least-once", we can avoid the latency increase. This is relevant for super-low-latency applications (< 10 ms) that tolerate duplicates. This pull request does: - Add an interface for the functionality of the barrier buffer, to allow adding different implementations - Add the `BarrierTracker` which only tracks checkpoint barriers, without aligning the streams. - Add broader tests for the BarrierBuffer, including trailing data and barrier races. - Checkpoint barriers are handled by the buffer directly, rather than being returned and re-injected. - Simplify logic in the BarrierBuffer and fix certain corner cases. - Give access to spill directories properly via I/O manager, rather than via GlobalConfiguration singleton. - Rename the "BarrierBufferIOTest" to "BarrierBufferMassiveRandomTest" - A lot of code style/robustness fixes (proplery define constants, visibility, exception signatures) @gyfora and @uce : The reworked variant of the BarrierBuffer eventually returns `null` when all buffers and events have been consumed. The previous impementation repeatedly returned the `EndOfPartitionEvent`. That seemed inconsistent with the implementation of the "vanilla" InputGate, which eventually returns `null` when all partitions have finished. You can merge this pull request into a Git repository by running: $ git pull https://github.com/StephanEwen/incubator-flink latency Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/938.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #938 ---- commit 9d9e34e85c812eedb7b0f6365952c853fc531627 Author: Stephan Ewen <se...@apache.org> Date: 2015-07-12T17:33:38Z [tests] Add a manual test to evaluate impact of checkpointing on latency commit 4cafad08a1b64bcaf0c95fe0b211eb740f6774b2 Author: Stephan Ewen <se...@apache.org> Date: 2015-07-26T16:58:37Z [FLINK-2406] [streaming] Abstract and improve stream alignment via the BarrierBuffer - Add an interface for the functionaliy of the barrier buffer (for later addition of other implementatiions) - Add broader tests for the BarrierBuffer, inluding trailing data and barrier races. - Checkpoint barriers are handled by the buffer directly, rather than being returned and re-injected. - Simplify logic in the BarrierBuffer and fix certain corner cases. - Give access to spill directories properly via I/O manager, rather than via GlobalConfiguration singleton. - Rename the "BarrierBufferIOTest" to "BarrierBufferMassiveRandomTest" - A lot of code style/robustness fixes (proplery define constants, visibility, exception signatures) commit ba9e5a3cc1aacb5078fdc50fee12f60ccefd2563 Author: Stephan Ewen <se...@apache.org> Date: 2015-07-26T17:05:30Z [FLINK-2402] [streaming] Add a stream checkpoint barrier tracker. The BarrierTracker is non-blocking and only counts barriers. That way, it does not increase latency of records in the stream, but can only be used to obtain "at least once" processing guarantees. ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---