Hi, I am developing Kafka stream-application using low level processor API.
I am consuming records from multiple topics, applying some joining logics, and then sending them off another topics. 1) How can I make sure my application consume records at consistent rate across different topics? If I were to distribute data across multiple topics, would that ensure that data will be consumed consistently? For example, let's say topicA, topicB,topicC has ~1000 records in 5 minute window. Can I assume these records will be consumed in the stream app at the same time? - meaning all 3000 records will land in the streaming application consistently. 2) Is there a way to consume some records faster than others in streaming application? Let's say topicA has 12X-15X times records being produced at a rate topicC produces records. Can I consume topicA somehow faster than I do for topicC? I am trying to overcome this topic volume imbalance by breaking down topicA into 12 topics ( so each sub-topicA has similar volume with topicC). 3) My join logic depends on the same event-time window of records from multiple topics I consume from. How can I make sure the same event-time window gets processed for each topic? 4) Is there a way to "hold" or "pause" on consuming records in Processor temporarily? I thought that if somehow one Processor consuming certain records takes too long, or "joinable" records arrive late, I was hoping I could delay another Processor that is "waiting" for it's joinable records. -- I am achieving this by hold the records in StateStore but this wait seems arbitrary. I am wondering if there is better solution than this. 5) Let's say I have 40 partitions on 10 topics I consume from and I have 40 instances of stream-application consuming from these. In this case how does setting "num.stream.threads" work? I see that is still uses one thread for "fetching" which doesn't really help the cause on "fetching" stuff but some of these threads are being used for "consuming" from monitoring JMX. Since each application handles 10 partitions ( one partition for each of 10 topics), I was setting num.stream.threads=10 but I don't have a very clear understanding of this value. Would you recommend increasing num of threads here? I'd very much appreciate any help on any of these questions. Thank you