Hey Navneeth, thank you for your interest in trying out Kafka Streams! Normally we will redirect new folks to the stream FAQ <https://docs.confluent.io/current/streams/faq.html> first in case you haven't checked it out. For details to your question:
1. Joining 2 topics using processor API (we call PAPI), you have to create 2 stream nodes, each of which needs to read in one join topic and materialize data into local state store. At the same time, they should cross reference other's state store to perform the joining in #process() upon receiving new records. You could check out KStreamKStreamJoin and KStreamImpl#join for more details. 2. If you are concerned about delaying due to one partition, I would recommend you try out wall clock time which advances based on real-world time. Check out here <https://docs.confluent.io/current/streams/faq.html#why-is-punctuate-not-called> 3. Not sure what specific read pattern you are looking for, but if you only need to read single key-value, we have interactive query support <https://kafka.apache.org/10/documentation/streams/developer-guide/interactive-queries.html> built in (which potentially solves your remote referral question). If you want to access the entire state store for debugging, you could opt in to dump the result like here <https://docs.confluent.io/current/streams/faq.html#inspecting-streams-and-tables>. And there is no state store across multiple machines, each individual state store belongs to one stream task, and they are isolated even on local. Hope this helps! Boyang On Sat, Aug 3, 2019 at 11:31 PM Navneeth Krishnan <reachnavnee...@gmail.com> wrote: > Hi All, > > I'm new to kafka streams and I'm trying to model my use case that is > currently written on flink to kafka streams. I have couple of questions. > > - How can I join two kafka topics using the processor API? I have data > stream and change stream which needs to be joined based on a key. > > - I read about the "STREAM_TIME" punctuation and it seems unless all > partitions have new records there will be not be an advance in time. Is > there a way to overcome this? > > - Is there a way to read the state store data through an API externally? If > so then how does it know where the state resides and if the state is across > multiple machines how does that work? > > Thanks for all your help. >