All right, looks like I was able to reproduce this after all. I believe I had to restart the producer after changing the offset retention setting in the broker. Its quite surprising to me that this was the default behavior before Kafka 2.1.0 -- I imagine quite a few people have been bitten hard by this.
Thanks for your help Matthias. Regards, Raman On Fri, Feb 22, 2019 at 11:13 AM Raman Gupta <rocketra...@gmail.com> wrote: > Hmm, it turns out I was mistaken -- we are running Kafka 2.0.0, not 2.1.0. > We are on 2.1.0 on the client side, not the server. > > Reading through KAFKA-4682, am I right in understanding that prior to > 2.1.0, the consumer group offsets could be deleted if the consumers are > running, but idle for longer than the offset.retention.minutes setting (and > yes, our consumers were in fact idle)? I tried this with a local Kafka > running `2.0.0-cp1` with offset.retention.minutes set to "5", and my > consumer offsets did not reset as expected. > > Regards, > Raman > > > On Fri, Feb 22, 2019 at 1:26 AM Matthias J. Sax <matth...@confluent.io> > wrote: > >> You can read the `__consumer_offsets` topics directly to see if the >> offset are there or not: >> >> https://stackoverflow.com/questions/33925866/kafka-how-to-read-from-consumer-offsets-topic >> >> Also, if your brokers are on version 2.1, offsets should not be deleted >> as long as the consumer group is online. The `offset.retention.minutes` >> config starts to tick when the consumer groups goes offline (cf >> https://issues.apache.org/jira/browse/KAFKA-4682) >> >> >> -Matthias >> >> On 2/21/19 3:08 PM, Raman Gupta wrote: >> > I am unable to reproduce it. >> > >> > I did note also that all the consumer offsets reset in this application, >> > not just the streams, so it appears that whatever happened is not >> > streams-specific. The only reason I can think of for all the consumers >> to >> > do this, is that the committed offsets information was "lost" somehow, >> and >> > so when the service started back up, it reverted to "earliest" as per >> the >> > configuration of "auto.offset.reset". In a test I ran, the logging >> output >> > and behavior of the consumers matches exactly this scenario. >> > >> > Now what I'm trying to understand is under what conditions the committed >> > offsets would be "lost"? The only ones I can think of are: >> > >> > a) The consumers were idle for longer than "offsets.retention.minutes" >> > (default of 7 days in our env, and no this was not the case for us) >> > b) Somebody mistakenly blew away the data in the topic where Kafka >> stores >> > the consumer offsets (as far as I know, this didn't happen, but we don't >> > have ACLs implemented -- what logs can I check for?) >> > >> > What other possibilities are there? >> > >> > Also, are there any other situations other than the committed offsets >> not >> > being present, in which the Java consumer Fetcher may print the log >> message >> > "Resetting offset for partition {} to offset {}."? >> > >> > Regards, >> > Raman >> > >> > >> > On Thu, Feb 21, 2019 at 2:00 AM Matthias J. Sax <matth...@confluent.io> >> > wrote: >> > >> >> Thanks for reporting the issue! >> >> >> >> Are you able to reproduce it? If yes, can you maybe provide broker and >> >> client logs in DEBUG level? >> >> >> >> -Matthias >> >> >> >> On 2/20/19 7:07 PM, Raman Gupta wrote: >> >>> I have an exactly-once stream that reads a topic, transforms it, and >> >> writes >> >>> new messages into the same topic as well as other topics. I am using >> >> Kafka >> >>> 2.1.0. The stream applications run in Kubernetes. >> >>> >> >>> I did a k8s deployment of the application with minor changes to the >> code >> >> -- >> >>> absolutely no changes to anything related to business logic or >> streams, >> >> and >> >>> no changes to the brokers at all. For some reason when the streams >> >>> application restarted, it reset a bunch of offsets to very old values, >> >> and >> >>> started re-processing old messages again (definitely violating the >> >> exactly >> >>> once principle!). >> >>> >> >>> There is no explanation in the logs at all as to why this offset reset >> >>> happened, including in the broker logs, and I am at a loss to >> understand >> >>> what is going on. >> >>> >> >>> Some example logs: >> >>> >> >>> February 20th 2019, 19:53:23.630 cis-69b4bc6fb7-8xnhc 2019-02-21 >> >>> 00:53:23,630 INFO --- [b0f59-StreamThread-1] >> >>> org.apa.kaf.cli.con.int.Fetcher : [Consumer >> >>> >> >> >> clientId=prod-cisSegmenter-abeba614-3bda-44bc-bc48-a278de9b0f59-StreamThread-1-consumer, >> >>> groupId=prod-cisSegmenter] Resetting offset for partition >> >>> prod-file-events-5 to offset 224. >> >>> February 20th 2019, 19:53:23.630 cis-69b4bc6fb7-8xnhc 2019-02-21 >> >>> 00:53:23,630 INFO --- [b0f59-StreamThread-1] >> >>> org.apa.kaf.cli.con.int.Fetcher : [Consumer >> >>> >> >> >> clientId=prod-cisSegmenter-abeba614-3bda-44bc-bc48-a278de9b0f59-StreamThread-1-consumer, >> >>> groupId=prod-cisSegmenter] Resetting offset for partition >> >>> prod-file-events-2 to offset 146. >> >>> February 20th 2019, 19:53:23.623 cis-69b4bc6fb7-8xnhc 2019-02-21 >> >>> 00:53:23,623 INFO --- [b0f59-StreamThread-1] >> >> org.apa.kaf.str.KafkaStreams >> >>> : stream-client >> >>> [prod-cisSegmenter-abeba614-3bda-44bc-bc48-a278de9b0f59] State >> transition >> >>> from REBALANCING to RUNNING >> >>> February 20th 2019, 19:53:23.622 cis-69b4bc6fb7-8xnhc 2019-02-21 >> >>> 00:53:23,622 INFO --- [b0f59-StreamThread-1] >> >>> org.apa.kaf.cli.con.KafkaConsumer : [Consumer >> >>> >> >> >> clientId=prod-cisSegmenter-abeba614-3bda-44bc-bc48-a278de9b0f59-StreamThread-1-restore-consumer, >> >>> groupId=] Unsubscribed all topics or patterns and assigned partitions >> >>> February 20th 2019, 19:53:23.622 cis-69b4bc6fb7-8xnhc 2019-02-21 >> >>> 00:53:23,622 INFO --- [b0f59-StreamThread-1] >> >>> org.apa.kaf.cli.con.KafkaConsumer : [Consumer >> >>> >> >> >> clientId=prod-cisSegmenter-abeba614-3bda-44bc-bc48-a278de9b0f59-StreamThread-1-restore-consumer, >> >>> groupId=] Unsubscribed all topics or patterns and assigned partitions >> >>> February 20th 2019, 19:53:23.622 cis-69b4bc6fb7-8xnhc 2019-02-21 >> >>> 00:53:23,622 INFO --- [b0f59-StreamThread-1] >> >>> org.apa.kaf.str.pro.int.StreamThread : stream-thread >> >>> >> [prod-cisSegmenter-abeba614-3bda-44bc-bc48-a278de9b0f59-StreamThread-1] >> >>> State transition from PARTITIONS_ASSIGNED to RUNNING >> >>> >> >>> Unless I'm totally misunderstanding something about how consumer >> groups >> >>> offsets are supposed to work, this behaviour is very very wrong. >> >>> >> >>> Regards, >> >>> Raman >> >>> >> >> >> >> >> > >> >>