Hello,

I am seeing an issue where I have a single streams app running (so a
consumer group of one) that is subscribed to about 10 topics. If the
streams app gets killed and restarted, many of the offsets for the consumer
group are reset to 0 and a lot of data is unintentionally reprocessed. The
offsets that get reset seem to be random but they usually only affect a few
partitions of the affected topics.

I don't seem to notice this problem if I maintain at least one running
instance of the streams app. For example, if I have a consumer group of
two, and take them down one at a time and update them, the issue is not
present.

Is there any obvious reason that I am missing that might be causing this to
happen? It appears that the app is cleanly shutting down, but if it is not,
could that explain what I am seeing?

Context:

- The streams application is running in Docker
- When a new version is deployed (application-id stays the same though) the
current running container is shut down and a new container is started, so
there is a time when no consumer instance is active.
- The container logs make it seem that the app is cleanly shut down.

Steps I go through to reproduce this issue:

1. Disallow writes to Kafka to ensure that no writes occur during the test
(dev environment)
2. Use kafka-consumer-group.sh script to verify there is a zero lag on all
partitions of all topics
3. Deploy a new version of the application (again the code is updated but
the application-id stays the same) which causes the streams app to die and
then be restarted.
4. Use kafka-consumer-group.sh script to check the lag, which shows high
lags on many topics and partitions


Any help is greatly appreciated. Thanks!
Jordon

Reply via email to