[ 
https://issues.apache.org/jira/browse/KAFKA-9543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17047711#comment-17047711
 ] 

Rafał Boniecki edited comment on KAFKA-9543 at 2/28/20 3:11 PM:
----------------------------------------------------------------

I cannot reproduce it on my development environment.  Couple of facts to add 
what Brian wrote:
 * This indeed does not happen on every segment rollover, but when it happens 
it is always on segment rollover
 * We have no compacted topics in our production cluster, so topic type doesn't 
matter.
 * No topic in our production environment starts at offset 0 - so this doesn't 
matter as well.
 * Topic where we definetly seen this happen has about 5MB/s traffic (so not 
that much traffic)
 * Fetch offset ... is out of range for partition is always about offset "from 
the future" (top, not bottom of the log). I assume kafka broker thinks it does 
not have this offset in log (at least according to data we gather from jmx). 
This suggests that maybe offsets are incorrectly cached or cache update has 
race condition. Also notice that before update client had 0 lag (you can see 
this in my attached screenshot), so probably this is crucial to reproduce this 
bug - you have to be reading top of the log all/most of the time to hit this.
 * we tested this in our development environment, where we load generated about 
5MB/s  traffic (using kafka-producer-perf-test.sh) and read it back (using 
identically , as in production environment, configured consumer) at the same 
time as it was written and cannot reproduce this. Test ran for 3 days non stop 
- we looked for offset resets and there were none.


was (Author: boniek):
I cannot reproduce it on my development environment.  Couple of facts to add 
what Brian wrote:
 * This indeed does not happen on every segment rollover, but when it happens 
it is always on segment rollover
 * We have no compacted topics in our production cluster, so topic type doesn't 
matter.
 * No topic in our production environment starts at offset 0 - so this doesn't 
matter as well.
 * Topic where we definetly seen this happen has about 5MB/s traffic (so not 
that much traffic)
 * Fetch offset ... is out of range for partition is always about offset "from 
the future". I assume kafka broker thinks it does not have this offset in log 
(at least according to data we gather from jmx). This suggests that maybe 
offsets are incorrectly cached or cache update has race condition. Also notice 
that before update client had 0 lag (you can see this in my attached 
screenshot), so probably this is crucial to reproduce this bug - you have to be 
reading top of the log all/most of the time to hit this.
 * we tested this in our development environment, where we load generated about 
5MB/s  traffic (using kafka-producer-perf-test.sh) and read it back (using 
identically , as in production environment, configured consumer) at the same 
time as it was written and cannot reproduce this. Test ran for 3 days non stop 
- we looked for offset resets and there were none.

> Consumer offset reset after new segment rolling
> -----------------------------------------------
>
>                 Key: KAFKA-9543
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9543
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 2.4.0
>            Reporter: Rafał Boniecki
>            Priority: Major
>         Attachments: Untitled.png
>
>
> After upgrade from kafka 2.1.1 to 2.4.0, I'm experiencing unexpected consumer 
> offset resets.
> Consumer:
> {code:java}
> 2020-02-12T11:12:58.402+01:00 hostname 4a2a39a35a02 
> [2020-02-12T11:12:58,402][INFO 
> ][org.apache.kafka.clients.consumer.internals.Fetcher] [Consumer 
> clientId=logstash-1, groupId=logstash] Fetch offset 1632750575 is out of 
> range for partition stats-5, resetting offset
> {code}
> Broker:
> {code:java}
> 2020-02-12 11:12:58:400 CET INFO  
> [data-plane-kafka-request-handler-1][kafka.log.Log] [Log partition=stats-5, 
> dir=/kafka4/data] Rolled new log segment at offset 1632750565 in 2 ms.{code}
> All resets are perfectly correlated to rolling new segments at the broker - 
> segment is rolled first, then, couple of ms later, reset on the consumer 
> occurs. Attached is grafana graph with consumer lag per partition. All sudden 
> spikes in lag are offset resets due to this bug.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to