Hi Johan, Your observation is correct, the root cause is that your two instances is being upgraded in sequential order: say your old topology is tp1, and your new topology with the new stream / topic is tp2, when you are upgrading say instance1, instance1 knows already about tp2 while the other instance2 still thinks the topology is tp2. If instance1 contains the leader of the consumer group than it will do the task assignment based on tp1 and send it via join-group responses, but instance2 upon receiving the tasks based on tp2 would not be able to "interpret" it since it only knows tp1.
For such cases if your change is only for adding a new topic / stream you probably do not need to reset your application, but you still need to bring down you app (all instances), and swap in the new code on all your instances and then re-start them. Guozhang On Mon, Jan 21, 2019 at 4:22 PM Matthias J. Sax <matth...@confluent.io> wrote: > That is expected... It's not possible to change the subscription during > a rolling restart. You need to stop all instances and afterwards start > new instances with the new subscription. > > I did not look into the details of your change, but you might also need > to reset your application before starting new instances, because > changing the subscription might be a "breaking" change: > > https://docs.confluent.io/current/streams/developer-guide/app-reset-tool.html > > > -Matthias > > > On 1/21/19 2:49 PM, Johan Horvius wrote: > > Hello! > > > > I'm having trouble when deploying a new version of a service during the > > re-balancing step where the topology doesn't match what KafkaStreams > > library assumes and there's a NPE while creating tasks. > > > > Background info: > > I'm running a Spring Boot service which utilizes KafkaStreams, currently > > subscribed to two topics that has 10 partitions each. The service is > > running in 2 instances for increased reliability and load balancing. > > In the next version of the service I've added another stream listening > > to a different topic. The service is deployed with a rolling strategy > > where first 2 instances of the new version is added and then the old > > versions 2 instances are shut down. > > > > When trying to deploy my new version the partitions are withdrawn and > > re-assigned and during the task creation the NPE happens and > > KafkaStreams goes into a failed state. > > > > Kafka is backed by 3 brokers in a cluster. > > > > I've tried to re-create the scenario in a simpler setting but been > > unable to do so. The re-balancing works fine when I try to run it > > locally with dummy test topics. > > > > I'm attaching the log from the service. > > > > While trying to figure out what was wrong the only conclusion I could > > come up with was that KafkaStreams got confused due to building an > > original topology and then during re-balance got tasks in another order > > and then it did not re-build the internal topology before trying to > > create tasks, thus a mismatch between KafkaStreams node groups > > associated with a task key such as 3_3 would not match up with the > > expected consumer/producer-combo. > > > > Hopefully you can shed some lights on what could be wrong. > > > > Regards > > Johan Horvius > > > > > > -- -- Guozhang