[ 
https://issues.apache.org/jira/browse/KAFKA-1050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13762070#comment-13762070
 ] 

Jay Kreps commented on KAFKA-1050:
----------------------------------

Hey Justin, yeah the two things I wanted to clarify:

1. Consider two setups: Kafka with replication factor 2 and 1 failed node, 
zookeeper with replication factor 3 and 1 failed node. Isn't taking writes on 
either of these equally dangerous in the sense that one more failure will leave 
you in an unrecoverable situation? You refer to "majority vote semantics" but I 
think the semantics (disregarding unsafe leader election) are the same, no? Can 
you clarify what you are looking for?

2. Yeah, it's worth clarifying that what we report back is just whether the 
write is just whether the write is "guaranteed". Not guaranteed is not the same 
as "did not occur". So, for example if the client issues a write and then dies 
or becomes partitioned from the cluster the write may succeed even though the 
ack cannot be sent back to the producer. So in the case of the acks=3 with only 
2 available servers we would tell the client "sorry we couldn't get 3x 
replication during the time we waited" but that doesn't mean we guarantee no 
write it just means we got fewer than 3.



 
                
> Support for "no data loss" mode
> -------------------------------
>
>                 Key: KAFKA-1050
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1050
>             Project: Kafka
>          Issue Type: Task
>            Reporter: Justin SB
>
> I'd love to use Apache Kafka, but for my application data loss is not 
> acceptable.  Even at the expense of availability (i.e. I need C not A in CAP).
> I think there are two things that I need to change to get a quorum model:
> 1) Make sure I set request.required.acks to 2 (for a 3 node cluster) or 3 
> (for a 5 node cluster) on every request, so that I can only write if a quorum 
> is active.
> 2) Prevent the behaviour where a non-ISR can become the leader if all ISRs 
> die.  I think this is as easy as tweaking 
> core/src/main/scala/kafka/controller/PartitionLeaderSelector.scala, 
> essentially to throw an exception around line 64 in the "data loss" case.
> I haven't yet implemented / tested this.  I'd love to get some input from the 
> Kafka-experts on whether my plan is:
>  (a) correct - will this work?
>  (b) complete - have I missed any cases?
>  (c) recommended - is this a terrible idea :-)
> Thanks for any pointers!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to