Howdy,

I'm trying to make a Mafka application which will join two topics with
different semantics than the KStreams join has natively.

Specifically I'd like to:
- Immediately emit matches to a topic.
- Track when matches are found so subsequent matches (which within the join
window would be considered duplicates) aren't reemitted.
- Emit unmatched items to a different topic when their time is a
configurable amount beyond the low water mark.
- Only emit a given key once within a window, either to the matched or
unmatched topic.

I'm still working through how (or if) this can be accomplished. Has someone
done something similar to this in the past?

One approach I considered was doing an outer join followed by a stateful
transformer which would implement the logic above, using punctuate() to
trigger flushes of unmatched items at the appropriate time past the
watermark. The biggest challenge I see to that approach is that the
StateStore interface doesn't provide a way to iterate over all keys in a
time window. The underlying RocksDB store has that capacity, but bridging
that gap looks painful.

The other approach I was considering trying to implement all that logic in
a single processor in order to avoid having the overhead of storing the
data first in the join implementation's store and then again in the
transform's store. I don't understand what challenges I'd have around
reading transactionally from two topics. I suspect that may not be
insignificantly difficult.

I'd be grateful for any insight or suggestions you could offer.

Reply via email to