> Seems like this would be a standard join operation Not sure, if I would consider this a "standard" join... Your windows have different size and thus "move" differently fast.
Kafka Stream joins provide sliding join semantics. Similar to a SQL query like this (conceptually): > SELECT * FROM stream1, stream2 > WHERE > stream1.key = stream2.key > AND > stream1.ts - windowSize <= stream2.ts AND stream2.ts <= stream1.ts + > windowSize Ie, it joins all records that are timely close to each other -- what is a very natural way to express a stream join (what I would consider a "standard" join). (Btw: this applies _one_ window over both streams) The hopping window semantics you describe, are also possible with a little extra work though: You would first use an aggregation that "collects" your windows (ie, your aggregation function build up a list of input records as aggregation result). You can apply this to both your streams with according TimeWindow.of().advanceBy() settings. For the join itself, you would merge both streams via `KStreamBuilder#merge` and apply a stateful (Value)Transformer downstream. In your Transformer#process you can write custom logic to compare your windows. Does this help? -Matthias On 5/3/17 10:51 AM, Jon Yeargers wrote: > I want to collect data in two windowed groups - 4 hours with a one hour > overlap and a 5 minute / 1 minute. I want to compare the values in the > _oldest_ window for each group. > > Seems like this would be a standard join operation but Im not clear on how > to limit which window the join operates on. I could keep a timestamp in > each aggregate and if it isn't what I want (IE < 4 hours old) then ignore > the join but this seems v inefficient. > > Likely Im missing the big-picture here again w/re KStreams. I keep running > into situations where it seems like Kafka Streams would be a great tool but > it just doesn't quite fit. Kind of like having a drawer with mixed > metric/std wrenches. >
signature.asc
Description: OpenPGP digital signature