Thanks Michael and Eno for your help - I always thought the unit of parallelism was a combination of topic & partition rather than just partition.
Out of interest though, had I subscribed for both topics in one subscriber - I would have expected records for both topics interleaved, why when running this in two separate processes do I not observe the same? Just wanting to try and form a mental model of how this is all working - I will try and look through some code over the weekend. If I fix this by changing the application ID for each streaming process - does this mean I lose the ability to share state stores between the applications? Unfortunately the data on the input topics are provided by a third party component which sends these keyless messages on a single partition per topic, so I have little ability to fix this at source :-( Thanks, Henry -- Henry Thacker On 28 April 2017 at 17:32:28, Michael Noll (mich...@confluent.io) wrote: > To add to what Eno said: > > You can of course use the Kafka Streams API to build an application that > consumes from multiple Kafka topics. But, going back to your original > question, the scalability of Kafka and the Kafka Streams API is based on > partitions, not on topics. > > -Michael > > > > > On Fri, Apr 28, 2017 at 6:28 PM, Eno Thereska <eno.there...@gmail.com> > wrote: > > Hi Henry, > > Kafka Streams scales differently and does not support having the same > application ID subscribe to different topics for scale-out. The way we > support scaling out if you want to use the same application id is through > partitions, i.e., Kafka Streams automatically assigns partitions to your > multiple instances. If you want to scale out using topics you'll need to > use different application IDs. > > So in a nutshell this pattern is not supported. Was there a reason you > needed to do it like that? > > Thanks > Eno > > On 28 Apr 2017, at 11:41, Henry Thacker <he...@henrythacker.com> wrote: > > Should also add - there are definitely live incoming messages on both > > input > > topics when my streams are running. The auto offset reset config is set > > to > > "earliest" and because the input data streams are quite large (several > millions records each), I set a relatively small max poll records (200) > > so > > we don't run into heartbeating issues if we restart intraday. > > Thanks, > Henry > > -- > Henry Thacker > > On 28 April 2017 at 11:37:53, Henry Thacker (he...@henrythacker.com) > > wrote: > > > Hi Eno, > > Thanks for your reply - the code that builds the topology is something > like this (I don't have email and the code access on the same machine > unfortunately - so might not be 100% accurate / terribly formatted!). > > The stream application is a simple verifier which stores a tiny bit of > state in a state store. The processor is custom and only has logic in > init() to store the context and retrieve the store and process(...) to > validate the incoming messages and forward these on when appropriate. > > There is no joining, aggregates or windowing. > > In public static void main: > > String topic = args[0]; > String output = args[1]; > > KStreamBuilder builder = new KStreamBuilder(); > > StateStoreSupplier stateStore = > Stores.create("mystore").withStringKeys().withByteArrayValues(). > > persistent().build(); > > > KStream<Bytes, Bytes> stream = builder.stream(topic); > > builder.addStateStore(stateStore); > > stream.process(this::buildStreamProcessor, "mystore"); > > stream.to(outputTopic); > > KafkaStreams streams = new KafkaStreams(builder, getProps()); > streams.setUncaughtExceptionHandler(...); > streams.start(); > > Thanks, > Henry > > > On 28 April 2017 at 11:26:07, Eno Thereska (eno.there...@gmail.com) > > wrote: > > > Hi Henry, > > Could you share the code that builds your topology so we see how the > topics are passed in? Also, this would depend on what the streaming > > logic > > is doing with the topics, e.g., if you're joining them then both > > partitions > > need to be consumed by the same instance. > > Eno > > On 28 Apr 2017, at 11:01, Henry Thacker <he...@henrythacker.com> > > wrote: > > > Hi, > > I'm using Kafka 0.10.0.1 and Kafka streams. When I have two different > processes, Consumer 1 and 2. They both share the same application ID, > > but > > subscribe for different single-partition topics. Only one stream > > consumer > > receives messages. > > The non working stream consumer just sits there logging: > > Starting stream thread [StreamThread-1] > Discovered coordinator <Host> (Id: ...) for group my-streamer > Revoking previously assigned partitions [] for group my-streamer > (Re-)joining group my-streamer > Successfully joined group my-streamer with generation 3 > Setting newly assigned partitions [] for group my-streamer > (Re-)joining group my-streamer > Successfully joined group my-streamer with generation 4 > > If I was trying to subscribe to the same topic & partition I could > understand this behaviour, but given that the subscriptions are for > different input topics, I would have thought this should work? > > Thanks, > Henry > > -- > Henry Thacker > > > > > >