Apurva Mehta created KAFKA-5396:
-----------------------------------

             Summary: Consumer reading from beginning of log can read the same 
message multiple times.
                 Key: KAFKA-5396
                 URL: https://issues.apache.org/jira/browse/KAFKA-5396
             Project: Kafka
          Issue Type: Bug
            Reporter: Apurva Mehta


I noticed this when running the transactions system test with hard broker 
bounces. We have a consumer in READ_COMMITTED mode reading from the tail of the 
log as the writes are appended.

This test has failed once because the concurrent consumer returned duplicate 
data. The actual log has no duplicates, so the problem is in the consumer. 

One of the duplicate values is '0', and is at offset 250 in output-topic-1. The 
first time it is read, we see the following.

{noformat}
[2017-06-07 05:50:34,601] TRACE Returning fetched records at offset 0 for 
assigned partition output-topic-0 and update position to 250 
(org.apache.kafka.clients.consumer.internals.Fetcher)
[2017-06-07 05:50:34,602] TRACE Preparing to read 2967 bytes of data for 
partition output-topic-1 with offset 250 
(org.apache.kafka.clients.consumer.internals.Fetcher)
[2017-06-07 05:50:34,602] TRACE Updating high watermark for partition 
output-topic-1 to 502 (org.apache.kafka.clients.consumer.internals.Fetcher)
[2017-06-07 05:50:34,613] TRACE Returning fetched records at offset 250 for 
assigned partition output-topic-1 and update position to 500 
(org.apache.kafka.clients.consumer.internals.Fetcher)
{noformat}

The next time it is read, we see this
{noformat}
[2017-06-07 05:51:36,386] TRACE Preparing to read 169858 bytes of data for 
partition output-topic-1 with offset 0 
(org.apache.kafka.clients.consumer.internals.Fetcher)
[2017-06-07 05:51:36,389] TRACE Updating high watermark for partition 
output-topic-1 to 13053 (org.apache.kafka.clients.consumer.internals.Fetcher)
[2017-06-07 05:51:36,391] TRACE Returning fetched records at offset 0 for 
assigned partition output-topic-1 and update position to 500 
(org.apache.kafka.clients.consumer.internals.Fetcher)
{noformat}

For some reason, the fetcher re-sent the data from offset 0, an reset the 
position to 500. 

This is the plain consumer doing 'poll' in a loop until it is killed. So this 
position reset is puzzling. 




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to