[ 
https://issues.apache.org/jira/browse/KAFKA-6189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey updated KAFKA-6189:
--------------------------
    Description: 
Steps to reproduce:
* Setup test:
** producer sends messages constantly. If cluster not available, then it will 
retry
** consumer polling
** topic has 3 partitions and replication factor 3. 
** min.insync.replicas=2
** producer has "acks=all"
** consumer has default "auto.offset.reset=latest"
** consumer manually commitSync offsets after handling messages.
** kafka cluster has 3 brokers
* Kill broker 0
* In consumer's logs:
{code}
2017-11-08 11:36:33,967 INFO  
org.apache.kafka.clients.consumer.internals.Fetcher           - Fetch offset 
10706682 is out of range for partition mytopic-2, resetting offset 
[kafka-consumer]
2017-11-08 11:36:33,968 INFO  
org.apache.kafka.clients.consumer.internals.Fetcher           - Fetch offset 
8024431 is out of range for partition mytopic-1, resetting offset 
[kafka-consumer]
2017-11-08 11:36:34,045 INFO  
org.apache.kafka.clients.consumer.internals.Fetcher           - Fetch offset 
8029505 is out of range for partition mytopic-0, resetting offset 
[kafka-consumer]
{code}

After that, consumer lost several messages on each partition.

Expected:
* return upper bound of range
* consumer should resume from that offset instead of "auto.offset.reset".

Workaround:
* put "auto.offset.reset=earliest"
* get a lot of duplicate messages, instead of lost

Looks like this is what happening during the recovery from broker failure (see 
attachment)

  was:
Steps to reproduce:
* Setup test:
** producer sends messages constantly. If cluster not available, then it will 
retry
** consumer polling
** topic has 3 partitions and replication factor 3. 
** min.insync.replicas=2
** producer has "acks=all"
** consumer has default "auto.offset.reset=latest"
** consumer manually commitSync offsets after handling messages.
** kafka cluster has 3 brokers
* Kill broker 0
* In consumer's logs:
{code}
2017-11-08 11:36:33,967 INFO  
org.apache.kafka.clients.consumer.internals.Fetcher           - Fetch offset 
10706682 is out of range for partition mytopic-2, resetting offset 
[kafka-consumer]
2017-11-08 11:36:33,968 INFO  
org.apache.kafka.clients.consumer.internals.Fetcher           - Fetch offset 
8024431 is out of range for partition mytopic-1, resetting offset 
[kafka-consumer]
2017-11-08 11:36:34,045 INFO  
org.apache.kafka.clients.consumer.internals.Fetcher           - Fetch offset 
8029505 is out of range for partition mytopic-0, resetting offset 
[kafka-consumer]
{code}

After that, consumer lost several messages on each partition.

Expected:
* return upper bound of range
* consumer should resume from that offset instead of "auto.offset.reset".

Workaround:
* put "auto.offset.reset=earliest"
* get a lot of duplicate messages, instead of lost

Looks like this is what happening during the recovery from broker failure:
 !kafkaLossingMessages.png|thumbnail! 


> Loosing messages while getting OFFSET_OUT_OF_RANGE eror in consumer
> -------------------------------------------------------------------
>
>                 Key: KAFKA-6189
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6189
>             Project: Kafka
>          Issue Type: Bug
>          Components: clients
>    Affects Versions: 0.11.0.0
>            Reporter: Andrey
>         Attachments: kafkaLossingMessages.png
>
>
> Steps to reproduce:
> * Setup test:
> ** producer sends messages constantly. If cluster not available, then it will 
> retry
> ** consumer polling
> ** topic has 3 partitions and replication factor 3. 
> ** min.insync.replicas=2
> ** producer has "acks=all"
> ** consumer has default "auto.offset.reset=latest"
> ** consumer manually commitSync offsets after handling messages.
> ** kafka cluster has 3 brokers
> * Kill broker 0
> * In consumer's logs:
> {code}
> 2017-11-08 11:36:33,967 INFO  
> org.apache.kafka.clients.consumer.internals.Fetcher           - Fetch offset 
> 10706682 is out of range for partition mytopic-2, resetting offset 
> [kafka-consumer]
> 2017-11-08 11:36:33,968 INFO  
> org.apache.kafka.clients.consumer.internals.Fetcher           - Fetch offset 
> 8024431 is out of range for partition mytopic-1, resetting offset 
> [kafka-consumer]
> 2017-11-08 11:36:34,045 INFO  
> org.apache.kafka.clients.consumer.internals.Fetcher           - Fetch offset 
> 8029505 is out of range for partition mytopic-0, resetting offset 
> [kafka-consumer]
> {code}
> After that, consumer lost several messages on each partition.
> Expected:
> * return upper bound of range
> * consumer should resume from that offset instead of "auto.offset.reset".
> Workaround:
> * put "auto.offset.reset=earliest"
> * get a lot of duplicate messages, instead of lost
> Looks like this is what happening during the recovery from broker failure 
> (see attachment)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to