Stephan Ewen created FLINK-5529: ----------------------------------- Summary: Improve / extends windowing documentation Key: FLINK-5529 URL: https://issues.apache.org/jira/browse/FLINK-5529 Project: Flink Issue Type: Sub-task Components: Documentation Reporter: Stephan Ewen Assignee: Kostas Kloudas Fix For: 1.2.0, 1.3.0
Suggested Outline: {code} Windows (0) Outline: The anatomy of a window operation stream [.keyBy(...)] <- keyed versus non-keyed windows .window(...) <- required: "assigner" [.trigger(...)] <- optional: "trigger" (else default trigger) [.evictor(...)] <- optional: "evictor" (else no evictor) [.allowedLateness()] <- optional, else zero .reduce/fold/apply() <- required: "function" (1) Types of windows - tumble - slide - session - global (2) Pre-defined windows timeWindow() (tumble, slide) countWindow() (tumble, slide) - mention that count windows are inherently resource leaky unless limited key space (3) Window Functions - apply: most basic, iterates over elements in window - aggregating: reduce and fold, can be used with "apply()" which will get one element - forward reference to state size section (4) Advanced Windows - assigner - simple - merging - trigger - registering timers (processing time, event time) - state in triggers - life cycle of a window - create - state - cleanup - when is window contents purged - when is state dropped - when is metadata (like merging set) dropped (5) Late data - picture - fire vs fire_and_purge: late accumulates vs late resurrects (cf discarding mode) (6) Evictors - TDB (7) State size: How large will the state be? Basic rule: Each element has one copy per window it is assigned to --> num windows * num elements in window --> example: tumbline is one copy, sliding(n,m) is n/m copies --> per key Pre-aggregation: - if reduce or fold is set -> one element per window (rather than num elements in window) - evictor voids pre-aggregation from the perspective of state Special rules: - fold cannot pre-aggregate on session windows (and other merging windows) (8) Non-keyed windows - all elements through the same windows - currently not parallel - possible parallel in the future when having pre-aggregation functions - inherently (by definition) produce a result stream with parallelism one - state similar to one key of keyed windows {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)