Duplicate messages might be due to network issues, but it is worthwhile to dig deeper.
It sounds like the problem happens when you have 3 partitions and 3 consumers. Based on my understanding (still learning), each consumer should have it's own partition to consume. Can you verify this while your test is running with kafka-run-class.sh kafka.tools.ConsumerOffsetChecker? Also, the duplicate messages, are they within a partition or across partitions? On Fri, Jun 19, 2015 at 9:23 AM, Adam Shannon <adam.shan...@banno.com> wrote: > Basically it boils down to the fact that distributed computers and their > networking are not reliable. [0] So, in order to ensure that messages do > infact get across there are cases where duplicates have to be sent. > > Take for example this simple experiment, given three servers A, B, and C. A > sends a message to C, but C processes the message and then dies before it > can send an ack to A that it got and processed the message. (Or even that > the network between A and C died, so the ack was lost.) So, A knows only > that it sent a message to C, but never heard a response. > > In order to guarantee that the message was delivered A must try and send > the message again. > > [0]: https://aphyr.com/posts/288-the-network-is-reliable > > On Thu, Jun 18, 2015 at 10:20 PM, Kris K <squareksc...@gmail.com> wrote: > > > Thanks Adam for your response. > > I will have a mechanism to handle duplicates on the service consuming the > > messages. > > Just curious, if there is a way to identify the cause for receiving > > duplicates. > > I mean any log file that could help with this? > > > > Regards, > > Kris > > > > On Wed, Jun 17, 2015 at 8:24 AM, Adam Shannon <adam.shan...@banno.com> > > wrote: > > > > > This is actually an expected consequence of using distributed systems. > > The > > > kafka FAQ has a good answer > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-HowdoIgetexactly-oncemessagingfromKafka > > > ? > > > > > > On Tue, Jun 16, 2015 at 11:06 PM, Kris K <squareksc...@gmail.com> > wrote: > > > > > > > Hi, > > > > > > > > While testing message delivery using kafka, I realized that few > > duplicate > > > > messages got delivered by the consumers in the same consumer group > (two > > > > consumers got the same message with few milli-seconds difference). > > > However, > > > > I do not see any redundancy at the producer or broker. One more > > > observation > > > > is that - this is not happening when I use only one consumer thread. > > > > > > > > I am running 3 brokers (0.8.2.1) with 3 Zookeeper nodes. There are 3 > > > > partitions in the topic and replication-factor is 3. For producing, > am > > > > using New Producer with compression.type=none. > > > > > > > > On the consumer end, I have 3 High level consumers in the same > consumer > > > > group running with one consumer thread each, on three different > hosts. > > > Auto > > > > commit is set to true for consumer. > > > > > > > > Size of each message would range anywhere between 0.7 KB and 2 MB. > The > > > max > > > > volume for this test is 100 messages/hr. > > > > > > > > I looked at controller log for any possibility of consumer rebalance > > > during > > > > this time, but did not find any. In the server log of all the brokers > > the > > > > error - java.io.IOException: Connection reset by peer is almost being > > > > written continuously. > > > > > > > > So, is it possible to achieve exactly-once delivery with the current > > high > > > > level consumer without needing an extra layer to remove redundancy? > > > > > > > > Could you please point me to any settings or logs that would help me > > tune > > > > the configuration ? > > > > > > > > *PS: I tried searching for similar discussions, but could not find > any. > > > If > > > > its already been answered, please provide the link. > > > > > > > > Thanks, > > > > Kris > > > > > > > > > > > > > > > > -- > > > Adam Shannon | Software Engineer | Banno | Jack Henry > > > 206 6th Ave Suite 1020 | Des Moines, IA 50309 | Cell: 515.867.8337 > > > > > > > > > -- > Adam Shannon | Software Engineer | Banno | Jack Henry > 206 6th Ave Suite 1020 | Des Moines, IA 50309 | Cell: 515.867.8337 >