[jira] [Commented] (KAFKA-1555) provide strong consistency with reasonable availability

Joel Koshy (JIRA) Fri, 24 Oct 2014 12:08:13 -0700

    [ 
https://issues.apache.org/jira/browse/KAFKA-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14183304#comment-14183304
 ]


Joel Koshy commented on KAFKA-1555:
-----------------------------------

Looks good - I just have a few minor edits to what you wrote.
----
configuration.html

_Minor edits: also, it would be good if we can also mention 
NotEnoughReplicasAfterAppend and document it_

  min.insync.replicas: The minimum number of replicas that are required to
  declare a message as committed. If the number of in-sync replicas drops
  below this threshold, then writing messages with request.required.acks set
  to -1 will return a NotEnoughReplicas or NotEnoughReplicasAfterAppend
  error code. This is used to provide enhanced durability guarantees - i.e.,
  all in-sync replicas need to acknowledge the message AND there needs to be
  at least this many replicas in the set of in-sync replicas.

----
ops.html:

log.cleanup.interval.mins=30 -> log.retention.check.interval.ms=300000

----
design.html

Couple of comments:

* all (or -1) brokers - maybe make it clear up front that this is all current 
in-sync replicas, and later clarify that consistency can be preferred over 
availability via the min.isr property
* bq. a message that was acked will not be lost as long as at least one in sync 
replica remains
** The above should probably be clarified a bit. i.e., availability of a 
replica affects whether a message will be lost or not only during the time it 
is yet to be replicated to all assigned replicas.
* It would be useful to describe how min.isr helps facilitate trading off 
consistency vs availability
* There are a couple of typos in various places

> provide strong consistency with reasonable availability
> -------------------------------------------------------
>
>                 Key: KAFKA-1555
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1555
>             Project: Kafka
>          Issue Type: Improvement
>          Components: controller
>    Affects Versions: 0.8.1.1
>            Reporter: Jiang Wu
>            Assignee: Gwen Shapira
>             Fix For: 0.8.2
>
>         Attachments: KAFKA-1555-DOCS.0.patch, KAFKA-1555-DOCS.1.patch, 
> KAFKA-1555.0.patch, KAFKA-1555.1.patch, KAFKA-1555.2.patch, 
> KAFKA-1555.3.patch, KAFKA-1555.4.patch, KAFKA-1555.5.patch, 
> KAFKA-1555.5.patch, KAFKA-1555.6.patch, KAFKA-1555.8.patch, KAFKA-1555.9.patch
>
>
> In a mission critical application, we expect a kafka cluster with 3 brokers 
> can satisfy two requirements:
> 1. When 1 broker is down, no message loss or service blocking happens.
> 2. In worse cases such as two brokers are down, service can be blocked, but 
> no message loss happens.
> We found that current kafka versoin (0.8.1.1) cannot achieve the requirements 
> due to its three behaviors:
> 1. when choosing a new leader from 2 followers in ISR, the one with less 
> messages may be chosen as the leader.
> 2. even when replica.lag.max.messages=0, a follower can stay in ISR when it 
> has less messages than the leader.
> 3. ISR can contains only 1 broker, therefore acknowledged messages may be 
> stored in only 1 broker.
> The following is an analytical proof. 
> We consider a cluster with 3 brokers and a topic with 3 replicas, and assume 
> that at the beginning, all 3 replicas, leader A, followers B and C, are in 
> sync, i.e., they have the same messages and are all in ISR.
> According to the value of request.required.acks (acks for short), there are 
> the following cases.
> 1. acks=0, 1, 3. Obviously these settings do not satisfy the requirement.
> 2. acks=2. Producer sends a message m. It's acknowledged by A and B. At this 
> time, although C hasn't received m, C is still in ISR. If A is killed, C can 
> be elected as the new leader, and consumers will miss m.
> 3. acks=-1. B and C restart and are removed from ISR. Producer sends a 
> message m to A, and receives an acknowledgement. Disk failure happens in A 
> before B and C replicate m. Message m is lost.
> In summary, any existing configuration cannot satisfy the requirements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-1555) provide strong consistency with reasonable availability

Reply via email to