Howdy, I'm trying to make a Mafka application which will join two topics with different semantics than the KStreams join has natively.
Specifically I'd like to: - Immediately emit matches to a topic. - Track when matches are found so subsequent matches (which within the join window would be considered duplicates) aren't reemitted. - Emit unmatched items to a different topic when their time is a configurable amount beyond the low water mark. - Only emit a given key once within a window, either to the matched or unmatched topic. I'm still working through how (or if) this can be accomplished. Has someone done something similar to this in the past? One approach I considered was doing an outer join followed by a stateful transformer which would implement the logic above, using punctuate() to trigger flushes of unmatched items at the appropriate time past the watermark. The biggest challenge I see to that approach is that the StateStore interface doesn't provide a way to iterate over all keys in a time window. The underlying RocksDB store has that capacity, but bridging that gap looks painful. The other approach I was considering trying to implement all that logic in a single processor in order to avoid having the overhead of storing the data first in the join implementation's store and then again in the transform's store. I don't understand what challenges I'd have around reading transactionally from two topics. I suspect that may not be insignificantly difficult. I'd be grateful for any insight or suggestions you could offer.