All right, looks like I was able to reproduce this after all. I believe I
had to restart the producer after changing the offset retention setting in
the broker. Its quite surprising to me that this was the default behavior
before Kafka 2.1.0 -- I imagine quite a few people have been bitten hard by
this.

Thanks for your help Matthias.

Regards,
Raman


On Fri, Feb 22, 2019 at 11:13 AM Raman Gupta <rocketra...@gmail.com> wrote:

> Hmm, it turns out I was mistaken -- we are running Kafka 2.0.0, not 2.1.0.
> We are on 2.1.0 on the client side, not the server.
>
> Reading through KAFKA-4682, am I right in understanding that prior to
> 2.1.0, the consumer group offsets could be deleted if the consumers are
> running, but idle for longer than the offset.retention.minutes setting (and
> yes, our consumers were in fact idle)? I tried this with a local Kafka
> running `2.0.0-cp1` with offset.retention.minutes set to "5", and my
> consumer offsets did not reset as expected.
>
> Regards,
> Raman
>
>
> On Fri, Feb 22, 2019 at 1:26 AM Matthias J. Sax <matth...@confluent.io>
> wrote:
>
>> You can read the `__consumer_offsets` topics directly to see if the
>> offset are there or not:
>>
>> https://stackoverflow.com/questions/33925866/kafka-how-to-read-from-consumer-offsets-topic
>>
>> Also, if your brokers are on version 2.1, offsets should not be deleted
>> as long as the consumer group is online. The `offset.retention.minutes`
>> config starts to tick when the consumer groups goes offline (cf
>> https://issues.apache.org/jira/browse/KAFKA-4682)
>>
>>
>> -Matthias
>>
>> On 2/21/19 3:08 PM, Raman Gupta wrote:
>> > I am unable to reproduce it.
>> >
>> > I did note also that all the consumer offsets reset in this application,
>> > not just the streams, so it appears that whatever happened is not
>> > streams-specific. The only reason I can think of for all the consumers
>> to
>> > do this, is that the committed offsets information was "lost" somehow,
>> and
>> > so when the service started back up, it reverted to "earliest" as per
>> the
>> > configuration of "auto.offset.reset". In a test I ran, the logging
>> output
>> > and behavior of the consumers matches exactly this scenario.
>> >
>> > Now what I'm trying to understand is under what conditions the committed
>> > offsets would be "lost"? The only ones I can think of are:
>> >
>> > a) The consumers were idle for longer than "offsets.retention.minutes"
>> > (default of 7 days in our env, and no this was not the case for us)
>> > b) Somebody mistakenly blew away the data in the topic where Kafka
>> stores
>> > the consumer offsets (as far as I know, this didn't happen, but we don't
>> > have ACLs implemented -- what logs can I check for?)
>> >
>> > What other possibilities are there?
>> >
>> > Also, are there any other situations other than the committed offsets
>> not
>> > being present, in which the Java consumer Fetcher may print the log
>> message
>> > "Resetting offset for partition {} to offset {}."?
>> >
>> > Regards,
>> > Raman
>> >
>> >
>> > On Thu, Feb 21, 2019 at 2:00 AM Matthias J. Sax <matth...@confluent.io>
>> > wrote:
>> >
>> >> Thanks for reporting the issue!
>> >>
>> >> Are you able to reproduce it? If yes, can you maybe provide broker and
>> >> client logs in DEBUG level?
>> >>
>> >> -Matthias
>> >>
>> >> On 2/20/19 7:07 PM, Raman Gupta wrote:
>> >>> I have an exactly-once stream that reads a topic, transforms it, and
>> >> writes
>> >>> new messages into the same topic as well as other topics. I am using
>> >> Kafka
>> >>> 2.1.0. The stream applications run in Kubernetes.
>> >>>
>> >>> I did a k8s deployment of the application with minor changes to the
>> code
>> >> --
>> >>> absolutely no changes to anything related to business logic or
>> streams,
>> >> and
>> >>> no changes to the brokers at all. For some reason when the streams
>> >>> application restarted, it reset a bunch of offsets to very old values,
>> >> and
>> >>> started re-processing old messages again (definitely violating the
>> >> exactly
>> >>> once principle!).
>> >>>
>> >>> There is no explanation in the logs at all as to why this offset reset
>> >>> happened, including in the broker logs, and I am at a loss to
>> understand
>> >>> what is going on.
>> >>>
>> >>> Some example logs:
>> >>>
>> >>> February 20th 2019, 19:53:23.630 cis-69b4bc6fb7-8xnhc 2019-02-21
>> >>> 00:53:23,630 INFO  --- [b0f59-StreamThread-1]
>> >>> org.apa.kaf.cli.con.int.Fetcher                   : [Consumer
>> >>>
>> >>
>> clientId=prod-cisSegmenter-abeba614-3bda-44bc-bc48-a278de9b0f59-StreamThread-1-consumer,
>> >>> groupId=prod-cisSegmenter] Resetting offset for partition
>> >>> prod-file-events-5 to offset 224.
>> >>> February 20th 2019, 19:53:23.630 cis-69b4bc6fb7-8xnhc 2019-02-21
>> >>> 00:53:23,630 INFO  --- [b0f59-StreamThread-1]
>> >>> org.apa.kaf.cli.con.int.Fetcher                   : [Consumer
>> >>>
>> >>
>> clientId=prod-cisSegmenter-abeba614-3bda-44bc-bc48-a278de9b0f59-StreamThread-1-consumer,
>> >>> groupId=prod-cisSegmenter] Resetting offset for partition
>> >>> prod-file-events-2 to offset 146.
>> >>> February 20th 2019, 19:53:23.623 cis-69b4bc6fb7-8xnhc 2019-02-21
>> >>> 00:53:23,623 INFO  --- [b0f59-StreamThread-1]
>> >> org.apa.kaf.str.KafkaStreams
>> >>>                     : stream-client
>> >>> [prod-cisSegmenter-abeba614-3bda-44bc-bc48-a278de9b0f59] State
>> transition
>> >>> from REBALANCING to RUNNING
>> >>> February 20th 2019, 19:53:23.622 cis-69b4bc6fb7-8xnhc 2019-02-21
>> >>> 00:53:23,622 INFO  --- [b0f59-StreamThread-1]
>> >>> org.apa.kaf.cli.con.KafkaConsumer                 : [Consumer
>> >>>
>> >>
>> clientId=prod-cisSegmenter-abeba614-3bda-44bc-bc48-a278de9b0f59-StreamThread-1-restore-consumer,
>> >>> groupId=] Unsubscribed all topics or patterns and assigned partitions
>> >>> February 20th 2019, 19:53:23.622 cis-69b4bc6fb7-8xnhc 2019-02-21
>> >>> 00:53:23,622 INFO  --- [b0f59-StreamThread-1]
>> >>> org.apa.kaf.cli.con.KafkaConsumer                 : [Consumer
>> >>>
>> >>
>> clientId=prod-cisSegmenter-abeba614-3bda-44bc-bc48-a278de9b0f59-StreamThread-1-restore-consumer,
>> >>> groupId=] Unsubscribed all topics or patterns and assigned partitions
>> >>> February 20th 2019, 19:53:23.622 cis-69b4bc6fb7-8xnhc 2019-02-21
>> >>> 00:53:23,622 INFO  --- [b0f59-StreamThread-1]
>> >>> org.apa.kaf.str.pro.int.StreamThread              : stream-thread
>> >>>
>> [prod-cisSegmenter-abeba614-3bda-44bc-bc48-a278de9b0f59-StreamThread-1]
>> >>> State transition from PARTITIONS_ASSIGNED to RUNNING
>> >>>
>> >>> Unless I'm totally misunderstanding something about how consumer
>> groups
>> >>> offsets are supposed to work, this behaviour is very very wrong.
>> >>>
>> >>> Regards,
>> >>> Raman
>> >>>
>> >>
>> >>
>> >
>>
>>

Reply via email to