Duplicate messages might be due to network issues, but it is worthwhile to
dig deeper.

It sounds like the problem happens when you have 3 partitions and 3
consumers. Based on my understanding (still learning), each consumer should
have it's own partition to consume. Can you verify this while your test is
running with kafka-run-class.sh kafka.tools.ConsumerOffsetChecker?

Also, the duplicate messages, are they within a partition or across
partitions?

On Fri, Jun 19, 2015 at 9:23 AM, Adam Shannon <adam.shan...@banno.com>
wrote:

> Basically it boils down to the fact that distributed computers and their
> networking are not reliable. [0] So, in order to ensure that messages do
> infact get across there are cases where duplicates have to be sent.
>
> Take for example this simple experiment, given three servers A, B, and C. A
> sends a message to C, but C processes the message and then dies before it
> can send an ack to A that it got and processed the message. (Or even that
> the network between A and C died, so the ack was lost.) So, A knows only
> that it sent a message to C, but never heard a response.
>
> In order to guarantee that the message was delivered A must try and send
> the message again.
>
> [0]: https://aphyr.com/posts/288-the-network-is-reliable
>
> On Thu, Jun 18, 2015 at 10:20 PM, Kris K <squareksc...@gmail.com> wrote:
>
> > Thanks Adam for your response.
> > I will have a mechanism to handle duplicates on the service consuming the
> > messages.
> > Just curious, if there is a way to identify the cause for receiving
> > duplicates.
> > I mean any log file that could help with this?
> >
> > Regards,
> > Kris
> >
> > On Wed, Jun 17, 2015 at 8:24 AM, Adam Shannon <adam.shan...@banno.com>
> > wrote:
> >
> > > This is actually an expected consequence of using distributed systems.
> > The
> > > kafka FAQ has a good answer
> > >
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-HowdoIgetexactly-oncemessagingfromKafka
> > > ?
> > >
> > > On Tue, Jun 16, 2015 at 11:06 PM, Kris K <squareksc...@gmail.com>
> wrote:
> > >
> > > > Hi,
> > > >
> > > > While testing message delivery using kafka, I realized that few
> > duplicate
> > > > messages got delivered by the consumers in the same consumer group
> (two
> > > > consumers got the same message with few milli-seconds difference).
> > > However,
> > > > I do not see any redundancy at the producer or broker. One more
> > > observation
> > > > is that - this is not happening when I use only one consumer thread.
> > > >
> > > > I am running 3 brokers (0.8.2.1) with 3 Zookeeper nodes. There are 3
> > > > partitions in the topic and replication-factor is 3. For producing,
> am
> > > > using New Producer with compression.type=none.
> > > >
> > > > On the consumer end, I have 3 High level consumers in the same
> consumer
> > > > group running with one consumer thread each, on three different
> hosts.
> > > Auto
> > > > commit is set to true for consumer.
> > > >
> > > > Size of each message would range anywhere between 0.7 KB and  2 MB.
> The
> > > max
> > > > volume for this test is 100 messages/hr.
> > > >
> > > > I looked at controller log for any possibility of consumer rebalance
> > > during
> > > > this time, but did not find any. In the server log of all the brokers
> > the
> > > > error - java.io.IOException: Connection reset by peer is almost being
> > > > written continuously.
> > > >
> > > > So, is it possible to achieve exactly-once delivery with the current
> > high
> > > > level consumer without needing an extra layer to remove redundancy?
> > > >
> > > > Could you please point me to any settings or logs that would help me
> > tune
> > > > the configuration ?
> > > >
> > > > *PS: I tried searching for similar discussions, but could not find
> any.
> > > If
> > > > its already been answered, please provide the link.
> > > >
> > > > Thanks,
> > > > Kris
> > > >
> > >
> > >
> > >
> > > --
> > > Adam Shannon | Software Engineer | Banno | Jack Henry
> > > 206 6th Ave Suite 1020 | Des Moines, IA 50309 | Cell: 515.867.8337
> > >
> >
>
>
>
> --
> Adam Shannon | Software Engineer | Banno | Jack Henry
> 206 6th Ave Suite 1020 | Des Moines, IA 50309 | Cell: 515.867.8337
>

Reply via email to