Hey,

are you sure, you want to join everything? This will result in a huge
memory footprint of your application. You are right, that you cannot use
KTable, however, windowed KStream joins would work -- you only need to
specify a huge window (ie, use Long.MAX_VALUE; this will effectively be
"infinitely large") thus that all data falls into a single window.

The issue will be, that all data will be buffered in memory, thus, if
your application run very long, it will eventually fail (I would
assume). Thus, again my initial question: are you sure, you want to join
everything? (It's stream processing, not batch processing...)

If the answer is still yes, and you hit a memory issue, you will need to
fall back to use Processor API instead of DSL to spill data to disk if
it does not fit into memory and more (ie, you will need to implement
your own version of an symmetric-hash-join that spills to disk). Of
course, the disk usage will also be huge. Eventually, your disc might
also become too small...

Can you clarify, why you want to join everything? This does not sound
like a good idea. Very large windows are handleable, but "infinite"
windows are very problematic in stream processing.


-Matthias


On 09/05/2016 06:25 PM, Guillermo Lammers Corral wrote:
> Hi,
> 
> I've been thinking how to solve with Kafka Streams one of my business
> process without success for the moment. Hope someone can help me.
> 
> I am reading from two topics events like that (I'll simplify the problem at
> this point):
> 
> ObjectX
> Key: String
> Value: String
> 
> ObjectY
> Key: String
> Value: String
> 
> I want to do some kind of "join" for all events without windowing but also
> without being KTables...
> 
> Example:
> 
> ==============================
> 
> ObjectX("0001", "a") -> TopicA
> 
> Expected output TopicResult:
> 
> nothing
> 
> ==============================
> 
> ObjectX("0001", "b") -> Topic A
> 
> Expected output TopicResult:
> 
> nothing
> 
> ==============================
> 
> ObjectY("0001", "d") -> Topic B:
> 
> Expected output TopicResult:
> 
> ObjectZ("0001", ("a", "d"))
> ObjectZ("0001", ("b", "d"))
> 
> ==============================
> 
> ==============================
> 
> ObjectY("0001", "e") -> Topic B:
> 
> Expected output TopicResult:
> 
> ObjectZ("0001", ("a", "e"))
> ObjectZ("0001", ("b", "e"))
> 
> ==============================
> 
> TopicResult at the end:
> 
> ObjectZ("0001", ("a", "d"))
> ObjectZ("0001", ("b", "d"))
> ObjectZ("0001", ("a", "e"))
> ObjectZ("0001", ("b", "e"))
> 
> ==============================
> 
> I think I can't use KTable-KTable join because I want to match all the
> events from the beginning of time. Hence, I can't use KStream-KStream join
> because force me to use windowing. Same for KStream-KTable join...
> 
> Any expert using Kafka Streams could help me with some tips?
> 
> Thanks in advance.
> 

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to