[ https://issues.apache.org/jira/browse/FLINK-6243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16561216#comment-16561216 ]
Elias Levy commented on FLINK-6243: ----------------------------------- Rereading my initial description of the issue, I see that I make no mention to our specific upsert requirements, so I think you are right that FLINK-8478 does satisfy this issue as described and that it may be best if I open a new issue for the upsert and (a)/(a,b) join semantics I'd like. > Continuous Joins: True Sliding Window Joins > -------------------------------------------- > > Key: FLINK-6243 > URL: https://issues.apache.org/jira/browse/FLINK-6243 > Project: Flink > Issue Type: New Feature > Components: DataStream API > Affects Versions: 1.1.4 > Reporter: Elias Levy > Priority: Major > > Flink defines sliding window joins as the join of elements of two streams > that share a window of time, where the windows are defined by advancing them > forward some amount of time that is less than the window time span. More > generally, such windows are just overlapping hopping windows. > Other systems, such as Kafka Streams, support a different notion of sliding > window joins. In these systems, two elements of a stream are joined if the > absolute time difference between the them is less or equal the time window > length. > This alternate notion of sliding window joins has some advantages in some > applications over the current implementation. > Elements to be joined may both fall within multiple overlapping sliding > windows, leading them to be joined multiple times, when we only wish them to > be joined once. > The implementation need not instantiate window objects to keep track of > stream elements, which becomes problematic in the current implementation if > the window size is very large and the slide is very small. > It allows for asymmetric time joins. E.g. join if elements from stream A are > no more than X time behind and Y time head of an element from stream B. > It is currently possible to implement a join with these semantics using > {{CoProcessFunction}}, but the capability should be a first class feature, > such as it is in Kafka Streams. > To perform the join, elements of each stream must be buffered for at least > the window time length. To allow for large window sizes and high volume of > elements, the state, possibly optionally, should be buffered such as it can > spill to disk (e.g. by using RocksDB). > The same stream may be joined multiple times in a complex topology. As an > optimization, it may be wise to reuse any element buffer among colocated join > operators. Otherwise, there may write amplification and increased state that > must be snapshotted. -- This message was sent by Atlassian JIRA (v7.6.3#76005)