Hello, I am seeing an issue where I have a single streams app running (so a consumer group of one) that is subscribed to about 10 topics. If the streams app gets killed and restarted, many of the offsets for the consumer group are reset to 0 and a lot of data is unintentionally reprocessed. The offsets that get reset seem to be random but they usually only affect a few partitions of the affected topics.
I don't seem to notice this problem if I maintain at least one running instance of the streams app. For example, if I have a consumer group of two, and take them down one at a time and update them, the issue is not present. Is there any obvious reason that I am missing that might be causing this to happen? It appears that the app is cleanly shutting down, but if it is not, could that explain what I am seeing? Context: - The streams application is running in Docker - When a new version is deployed (application-id stays the same though) the current running container is shut down and a new container is started, so there is a time when no consumer instance is active. - The container logs make it seem that the app is cleanly shut down. Steps I go through to reproduce this issue: 1. Disallow writes to Kafka to ensure that no writes occur during the test (dev environment) 2. Use kafka-consumer-group.sh script to verify there is a zero lag on all partitions of all topics 3. Deploy a new version of the application (again the code is updated but the application-id stays the same) which causes the streams app to die and then be restarted. 4. Use kafka-consumer-group.sh script to check the lag, which shows high lags on many topics and partitions Any help is greatly appreciated. Thanks! Jordon