GitHub user gyfora opened a pull request:

    https://github.com/apache/flink/pull/395

    StreamWindow abstraction + modular window computations

    This PR introduces a rework to the current windowing semantics, by major 
API and Runtime improvements.
    
    The core new abstraction is the StreamWindow which encapsulates the 
contents of a window in the data stream. Now the WindowedDataStream symbolises 
transformations on a DataStream of StreamWindows. This allows the users to 
properly collect, and work with the results of the window transformations.
    
    API changes:
    
    Previously the call `ds.window(...).groupBy(...).reduce(...)` would have 
returned a stream of reduced values by key in the given window. There was no 
way of properly obtaining all the (key, value) pairs from the window reduce 
calls just the flattened windows. This also made it impossible to apply more 
than one transformations on the same window without having to re-descritize it.
    
    The new api introduces several changes here:
    
    `ds.window(...).groupBy(...).reduceWindow(...)` returns a 
WindowedDataStream which than can be further transformed by applying 
transformations on the resulting window. The user can also call `.flatten()` or 
`.getDiscretizedStream()` to obtain the DataStream of the flattened values or 
the DataStream of StreamWindows.
    
    The reduce and reduceGroup functions have been renamed to reduceWindow and 
mapWindow respectively to emphasise the functionality.
    
    Runtime changes:
    
    The windowing runtime has been completely reworked to make the components 
modular for future improvements:
    
    DataStream -> StreamDiscretizer -> (PreAggregator) -> WindowPartitioner -> 
Transformation (reduce/map) -> WindowMerge 
    
    This allows easier special-casing for specific window types for maximal 
performance, such as applying special preaggregators or parallel stream 
discretisation for global windows (time) as well.
    
    Further tasks after merge:
    -Add pre-aggregators for time windows
    -Add pre-aggregator for sliding count windows
    -Improve test coverage for all pre-implemented policies
    -Further optimise partitioning-reduce parallelism
    
    I will add one last commit which will update the javadocs and the streaming 
guide.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mbalassi/flink window

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/395.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #395
    
----
commit f41a237ee05dd13caf6ddb805a917fc63d20c1a8
Author: Gyula Fora <gyf...@apache.org>
Date:   2015-02-04T16:14:51Z

    [streaming] StreamWindow abstraction added with typeinfo and tests

commit 6b7243bda432e7cd7d0d3dff37871a67f7c3fbc0
Author: Gyula Fora <gyf...@apache.org>
Date:   2015-02-04T16:26:16Z

    [FLINK-1176] [streaming] Added invokables for modular windowing 
tranformations

commit 3df8b613801c2dd7da12a7467acabd21cac14c8e
Author: Gyula Fora <gyf...@apache.org>
Date:   2015-02-04T16:27:34Z

    [FLINK-1176] [streaming] WindowedDataStream rework for new windowing runtime

commit df54364fe3bfa38a4eaafd5a7e774087a9996aaf
Author: Gyula Fora <gyf...@apache.org>
Date:   2015-02-08T12:09:34Z

    [streaming] StreamDiscretizer rework to support only 1 eviction and trigger 
for robustness + test cleanup

commit f14936ea62ebc999c765f536d79ba91f3709256b
Author: Gyula Fora <gyf...@apache.org>
Date:   2015-02-08T12:10:00Z

    [streaming] Integration test added for windowed operations

commit fca43c132f6988440d179f5922f3c150dc2dc9c9
Author: Gyula Fora <gyf...@apache.org>
Date:   2015-02-08T12:46:47Z

    [streaming] Streaming scala api fix for window changes + example cleanup

commit 8109c85a987fcb312d2575962abcf4b6f971b122
Author: Gyula Fora <gyf...@apache.org>
Date:   2015-02-08T21:59:22Z

    [streaming] GroupedTimeDiscretizer added for lighter time policy thread 
management

commit 8df13b8d9f117633ac4c7537b3c03d8d2f4bbf40
Author: Gyula Fora <gyf...@apache.org>
Date:   2015-02-10T11:21:09Z

    [streaming] WindowBuffer interface added for preaggregator logic + simple 
tumbling prereducer

commit bae76b7b310d313d2296d91e910cbf5d0d13c344
Author: Gyula Fora <gyf...@apache.org>
Date:   2015-02-11T11:57:06Z

    [streaming] Reimplemented StreamWindow.split to avoid dependency issues

commit 2a88e35cd59a9f8cdc2e460f6ec89c03e430308d
Author: Gyula Fora <gyf...@apache.org>
Date:   2015-02-12T19:45:34Z

    [streaming] TumblingGroupedPreReducer added + new tests for windowing

commit 0dba2c5cfc1bd9a8dd51a0ac630fe0a0ac3f9f5d
Author: Gyula Fora <gyf...@apache.org>
Date:   2015-02-13T10:03:08Z

    [streaming] WindowMapFunction added + Streaming package structure cleanup

commit 592a6ec90ade022ef4758ef5cc678d17b33a7afd
Author: Gyula Fora <gyf...@apache.org>
Date:   2015-02-13T18:57:55Z

    [FLINK-1539] [streaming] Remove calls to uninitalized runtimecontexts

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to