Experimenting with kafka myself, and found timeouts/batch expiry (valid and
invalid configurations), and max retries also can drop messages unless you
handle and log them gracefully. There are also a bunch of
org.apache.kafka.common.KafkaException hierarchy exceptions some of which
are thrown for valid reasons but also drop messages like size of messages,
buffer size, etc.,.
On Sun, Aug 28, 2016 at 1:55 AM, Jayesh Thakrar <j_thak...@yahoo.com.invalid
> wrote:

> I am looking at ways how one might have data loss and duplication in a
> Kafka cluster and need some help/pointers/discussions.
> So far, here's what I have come up with:
> Loss at producer-sideSince the data send call is actually adding data to a
> cache/buffer, a crash of the producer can potentially result in data
> loss.Another scenario for data loss is a producer exiting without closing
> the producer connection.
> Loss at broker-sideI think there are several situations here - all of
> which are triggered by a broker or controller crash or network issues with
> zookeepers (kind of simulating broker crashes).
> If I understand correctly, KAFKA-1211 (https://issues.apache.org/
> jira/browse/KAFKA-1211) implies that when acks is set to 0/1 and the
> leader crashes, there is a probability of data loss. Hopefully
> implementation of leader generation will help avoid this (
> https://issues.apache.org/jira/browse/KAFKA-1211?
> focusedCommentId=15402622&page=com.atlassian.jira.
> plugin.system.issuetabpanels:comment-tabpanel#comment-15402622)
> And a unique situation as described in KAFKA-3410 (
> https://issues.apache.org/jira/browse/KAFKA-3410) can cause broker or
> cluster shutdown leading to data loss as described in KAFKA-3924 (resolved
> in 0.10.0.1).
> And data duplication can attributed primarily to consumer offset
> management which is done at batch/periodic intervals.
> Can anyone think or know of any other scenarios?
> Thanks,Jayesh
>
>
>
>


-- 
Radha Krishna, Proddaturi
253-234-5657

Reply via email to