[ https://issues.apache.org/jira/browse/KAFKA-3569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ismael Juma updated KAFKA-3569: ------------------------------- Fix Version/s: (was: 0.10.1.0) > commitAsync() sometimes fails with errors > ----------------------------------------- > > Key: KAFKA-3569 > URL: https://issues.apache.org/jira/browse/KAFKA-3569 > Project: Kafka > Issue Type: Bug > Components: clients > Affects Versions: 0.9.0.1 > Environment: MacOS Docker > Reporter: Greg Zoller > Labels: clients > Fix For: 0.10.0.0 > > > I have a KafkaConsumer instance I've wrapped in a thread, which communicates > with the outside (multi-threaded) world via a blocking queue. Code is here: > https://gist.github.com/gzoller/93fe2392fd3606bcb3b879e4ab2f8f6e > I'm not worried about batch commits at this point and want to understand > single-message commit behavior first. If I commitSync() a single message it > is "slow" but is consistent--doesn't drop commits. > If I use commitAsync() its "fast" but I get flakey results--it drops commits, > even for small numbers. > I pre-loaded a 4-partition topic with 12 messages--3 per partition. Then I > use this code across 2 consumers (each with their own instance of this class, > hence their own thread). One consumer winds up listening on 2 partitions and > the other on the remaining 2. > Read logs confirm the poll() behavior/content is working as expected for the > 2 consumers, meaning each of the 2 consumers is successfully seeing (and only > seeing) messages from their respectively assigned partitions. > Some of the 12 messages committed fine, while others report errors like this > one in the callback: > ERROR [{lowercaseStrings-2=OffsetAndMetadata{offset=1, metadata=''}}]: > org.apache.kafka.clients.consumer.internals.SendFailedExceptionERROR > My final offsets after my test run of 12: > GROUP, TOPIC, PARTITION, CURRENT OFFSET, LOG END OFFSET, LAG, OWNER > group1, lowercaseStrings, 0, 2, 3, 1, consumer-1_/192.168.99.1 > group1, lowercaseStrings, 1, unknown, 3, unknown, consumer-1_/192.168.99.1 > group1, lowercaseStrings, 2, unknown, 3, unknown, consumer-2_/192.168.99.1 > group1, lowercaseStrings, 3, 2, 3, 1, consumer-2_/192.168.99.1 > The "missing" offsets correspond to the ones that produced errors, so all > messages are accounted for, either by success or by error. > At high volumes the behavior is the same. Over 1 million messages I'll drop > 30K-60K of them due to these same kinds of errors, while the other commit > successfully. The speed difference is profound, though! commitSync() takes > several minutes for 1m, but drops none. commitAsync() takes maybe 5 seconds > with losses. > I noted there has been some work done in this area in 0.10.1.0 (for example > SendFailedException doesn't seem to be in the code anymore) and was eager to > see if the problem persists, but I'm having KafkaProducer problems in > 0.10.1.0 and haven't been able to see if this behavior remains or not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)