[ https://issues.apache.org/jira/browse/KAFKA-6189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrey updated KAFKA-6189: -------------------------- Description: Steps to reproduce: * Setup test: ** producer sends messages constantly. If cluster not available, then it will retry ** consumer polling ** topic has 3 partitions and replication factor 3. ** min.insync.replicas=2 ** producer has "acks=all" ** consumer has default "auto.offset.reset=latest" ** consumer manually commitSync offsets after handling messages. ** kafka cluster has 3 brokers * Kill broker 0 * In consumer's logs: {code} 2017-11-08 11:36:33,967 INFO org.apache.kafka.clients.consumer.internals.Fetcher - Fetch offset 10706682 is out of range for partition mytopic-2, resetting offset [kafka-consumer] 2017-11-08 11:36:33,968 INFO org.apache.kafka.clients.consumer.internals.Fetcher - Fetch offset 8024431 is out of range for partition mytopic-1, resetting offset [kafka-consumer] 2017-11-08 11:36:34,045 INFO org.apache.kafka.clients.consumer.internals.Fetcher - Fetch offset 8029505 is out of range for partition mytopic-0, resetting offset [kafka-consumer] {code} After that, consumer lost several messages on each partition. Expected: * return upper bound of range * consumer should resume from that offset instead of "auto.offset.reset". Workaround: * put "auto.offset.reset=earliest" * get a lot of duplicate messages, instead of lost Looks like this is what happening during the recovery from broker failure (see attachment) was: Steps to reproduce: * Setup test: ** producer sends messages constantly. If cluster not available, then it will retry ** consumer polling ** topic has 3 partitions and replication factor 3. ** min.insync.replicas=2 ** producer has "acks=all" ** consumer has default "auto.offset.reset=latest" ** consumer manually commitSync offsets after handling messages. ** kafka cluster has 3 brokers * Kill broker 0 * In consumer's logs: {code} 2017-11-08 11:36:33,967 INFO org.apache.kafka.clients.consumer.internals.Fetcher - Fetch offset 10706682 is out of range for partition mytopic-2, resetting offset [kafka-consumer] 2017-11-08 11:36:33,968 INFO org.apache.kafka.clients.consumer.internals.Fetcher - Fetch offset 8024431 is out of range for partition mytopic-1, resetting offset [kafka-consumer] 2017-11-08 11:36:34,045 INFO org.apache.kafka.clients.consumer.internals.Fetcher - Fetch offset 8029505 is out of range for partition mytopic-0, resetting offset [kafka-consumer] {code} After that, consumer lost several messages on each partition. Expected: * return upper bound of range * consumer should resume from that offset instead of "auto.offset.reset". Workaround: * put "auto.offset.reset=earliest" * get a lot of duplicate messages, instead of lost Looks like this is what happening during the recovery from broker failure: !kafkaLossingMessages.png|thumbnail! > Loosing messages while getting OFFSET_OUT_OF_RANGE eror in consumer > ------------------------------------------------------------------- > > Key: KAFKA-6189 > URL: https://issues.apache.org/jira/browse/KAFKA-6189 > Project: Kafka > Issue Type: Bug > Components: clients > Affects Versions: 0.11.0.0 > Reporter: Andrey > Attachments: kafkaLossingMessages.png > > > Steps to reproduce: > * Setup test: > ** producer sends messages constantly. If cluster not available, then it will > retry > ** consumer polling > ** topic has 3 partitions and replication factor 3. > ** min.insync.replicas=2 > ** producer has "acks=all" > ** consumer has default "auto.offset.reset=latest" > ** consumer manually commitSync offsets after handling messages. > ** kafka cluster has 3 brokers > * Kill broker 0 > * In consumer's logs: > {code} > 2017-11-08 11:36:33,967 INFO > org.apache.kafka.clients.consumer.internals.Fetcher - Fetch offset > 10706682 is out of range for partition mytopic-2, resetting offset > [kafka-consumer] > 2017-11-08 11:36:33,968 INFO > org.apache.kafka.clients.consumer.internals.Fetcher - Fetch offset > 8024431 is out of range for partition mytopic-1, resetting offset > [kafka-consumer] > 2017-11-08 11:36:34,045 INFO > org.apache.kafka.clients.consumer.internals.Fetcher - Fetch offset > 8029505 is out of range for partition mytopic-0, resetting offset > [kafka-consumer] > {code} > After that, consumer lost several messages on each partition. > Expected: > * return upper bound of range > * consumer should resume from that offset instead of "auto.offset.reset". > Workaround: > * put "auto.offset.reset=earliest" > * get a lot of duplicate messages, instead of lost > Looks like this is what happening during the recovery from broker failure > (see attachment) -- This message was sent by Atlassian JIRA (v6.4.14#64029)