I am not sure exactly what semantics you want to have. Note, that Kafka Streams provides a sliding window join between two stream. Thus, I am not sure what you mean by
>> Track when matches are found so subsequent matches (which within the join >> window would be considered duplicates) aren't reemitted. Also note, that result records are timestamped with current "stream time". If you have late arriving records, the timestamp would not fall in the actual time-window and thus subsequent filtering would not work as you cannot know the correct time window a result record originates from. Thus, I think it would be best to implement the join as a custom operator using a stateful transform. -Matthias On 7/22/17 1:00 AM, Leif Wickland wrote: > Howdy, > > I'm trying to make a Mafka application which will join two topics with > different semantics than the KStreams join has natively. > > Specifically I'd like to: > - Immediately emit matches to a topic. > - Track when matches are found so subsequent matches (which within the join > window would be considered duplicates) aren't reemitted. > - Emit unmatched items to a different topic when their time is a > configurable amount beyond the low water mark. > - Only emit a given key once within a window, either to the matched or > unmatched topic. > > I'm still working through how (or if) this can be accomplished. Has someone > done something similar to this in the past? > > One approach I considered was doing an outer join followed by a stateful > transformer which would implement the logic above, using punctuate() to > trigger flushes of unmatched items at the appropriate time past the > watermark. The biggest challenge I see to that approach is that the > StateStore interface doesn't provide a way to iterate over all keys in a > time window. The underlying RocksDB store has that capacity, but bridging > that gap looks painful. > > The other approach I was considering trying to implement all that logic in > a single processor in order to avoid having the overhead of storing the > data first in the join implementation's store and then again in the > transform's store. I don't understand what challenges I'd have around > reading transactionally from two topics. I suspect that may not be > insignificantly difficult. > > I'd be grateful for any insight or suggestions you could offer. >
signature.asc
Description: OpenPGP digital signature