Thanks for the Jira info. Just to clarify, in the case we are outlining above, would the producer would have received an ack on m2 (with acks = -1) or not? If not, then I have no concerns, if so, then how would the producer know to re-publish?
On Thu, Jul 24, 2014 at 9:38 AM, Jun Rao <jun...@gmail.com> wrote: > About re-publishing m2. it seems it's better to let the producer choose > whether to do this or not. > > There is another known bug KAFKA-1211 that's not fixed yet. The situation > when this can happen is relatively rare and the fix is slightly involved. > So, it may not be addressed in 0.8.2. > > Thanks, > > Jun > > > On Tue, Jul 22, 2014 at 8:36 PM, scott@heroku <sc...@heroku.com> wrote: > > > Thanks so much for the detailed explanation Jun, it pretty much lines up > > with my understanding. > > > > In the case below, if we didn't particularly care about ordering and > > re-produced m2, it would then become m5, and in many use cases this would > > be ok? > > > > Perhaps a more direct question would be, once 0.8.2 is out and I have a > > topic with unclean leader election disabled, and produce with acks = -1, > > are there any known series of events (other than disk failures on all > > brokers) that would cause the loss of messages that a producer has > received > > an ack for? > > > > > > > > > > > > Sent from my iPhone > > > > > On Jul 22, 2014, at 8:17 PM, Jun Rao <jun...@gmail.com> wrote: > > > > > > They key point is that we have to keep all replicas consistent with > each > > > other such that no matter which replica a consumer reads from, it > always > > > reads the same data on a given offset. The following is an example. > > > > > > Suppose we have 3 brokers A, B and C. Let's assume A is the leader and > at > > > some point, we have the following offsets and messages in each replica. > > > > > > offset A B C > > > 1 m1 m1 m1 > > > 2 m2 > > > > > > Let's assume that message m1 is committed and message m2 is not. At > > exactly > > > this moment, replica A dies. After a new leader is elected, say B, new > > > messages can be committed with just replica B and C. Some point later > if > > we > > > commit two more messages m3 and m4, we will have the following. > > > > > > offset A B C > > > 1 m1 m1 m1 > > > 2 m2 m3 m3 > > > 3 m4 m4 > > > > > > Now A comes back. For consistency, it's important for A's log to be > > > identical to B and C. So, we have to remove m2 from A's log and add m3 > > and > > > m4. As you can see, whether you want to republish m2 or not, m2 cannot > > stay > > > in its current offset, since in other replicas, that offset is already > > > taken by other messages. Therefore, a truncation of replica A's log is > > > needed to keep the replicas consistent. Currently, we don republish > > > messages like m2 since (1) it's not necessary since it's never > considered > > > committed; (2) it will make our protocol more complicated. > > > > > > Thanks, > > > > > > Jun > > > > > > > > > > > > > > >> On Tue, Jul 22, 2014 at 3:40 PM, scott@heroku <sc...@heroku.com> > wrote: > > >> > > >> Thanks Jun > > >> > > >> Can you explain a little more about what an uncommitted message means? > > >> The messages are in the log so presumably? they have been acked at > least > > >> by the the local broker. > > >> > > >> I guess I am hoping for some intuition around why 'replaying' the > > messages > > >> in question would cause bad things. > > >> > > >> Thanks! > > >> > > >> > > >> Sent from my iPhone > > >>> On Jul 22, 2014, at 3:06 PM, Jun Rao <jun...@gmail.com> wrote: > > >>> > > >>> Scott, > > >>> > > >>> The reason for truncation is that the broker that comes back may have > > >> some > > >>> un-committed messages. Those messages shouldn't be exposed to the > > >> consumer > > >>> and therefore need to be removed from the log. So, on broker startup, > > we > > >>> first truncate the log to a safe point before which we know all > > messages > > >>> are committed. This broker will then sync up with the current leader > to > > >> get > > >>> the remaining messages. > > >>> > > >>> Thanks, > > >>> > > >>> Jun > > >>> > > >>> > > >>>> On Tue, Jul 22, 2014 at 9:42 AM, Scott Clasen <sc...@heroku.com> > > wrote: > > >>>> > > >>>> Ahh, yes that message loss case. I've wondered about that myself. > > >>>> > > >>>> I guess I dont really understand why truncating messages is ever the > > >> right > > >>>> thing to do. As kafka is an 'at least once' system. (send a > message, > > >> get > > >>>> no ack, it still might be on the topic) consumers that care will > have > > to > > >>>> de-dupe anyhow. > > >>>> > > >>>> To the kafka designers: is there anything preventing implementation > > of > > >>>> alternatives to truncation? when a broker comes back online and > needs > > to > > >>>> truncate, cant it fire up a producer and take the extra messages and > > >> send > > >>>> them back to the original topic or alternatively an error topic? > > >>>> > > >>>> Would love to understand the rationale for the current design, as my > > >>>> perspective is doubtfully as clear as the designers' > > >>>> > > >>>> > > >>>> > > >>>> > > >>>> On Tue, Jul 22, 2014 at 6:21 AM, Jiang Wu (Pricehistory) (BLOOMBERG/ > > 731 > > >>>> LEX -) <jwu...@bloomberg.net> wrote: > > >>>> > > >>>>> kafka-1028 addressed another unclean leader election problem. It > > >> prevents > > >>>>> a broker not in ISR from becoming a leader. The problem we are > facing > > >> is > > >>>>> that a broker in ISR but without complete messages may become a > > leader. > > >>>>> It's also a kind of unclean leader election, but not the one that > > >>>>> kafka-1028 addressed. > > >>>>> > > >>>>> Here I'm trying to give a proof that current kafka doesn't achieve > > the > > >>>>> requirement (no message loss, no blocking when 1 broker down) due > to > > >> its > > >>>>> two behaviors: > > >>>>> 1. when choosing a new leader from 2 followers in ISR, the one with > > >> less > > >>>>> messages may be chosen as the leader > > >>>>> 2. even when replica.lag.max.messages=0, a follower can stay in ISR > > >> when > > >>>>> it has less messages than the leader. > > >>>>> > > >>>>> We consider a cluster with 3 brokers and a topic with 3 replicas. > We > > >>>>> analyze different cases according to the value of > > request.required.acks > > >>>>> (acks for short). For each case and it subcases, we find situations > > >> that > > >>>>> either message loss or service blocking happens. We assume that at > > the > > >>>>> beginning, all 3 replicas, leader A, followers B and C, are in > sync, > > >>>> i.e., > > >>>>> they have the same messages and are all in ISR. > > >>>>> > > >>>>> 1. acks=0, 1, 3. Obviously these settings do not satisfy the > > >> requirement. > > >>>>> 2. acks=2. Producer sends a message m. It's acknowledged by A and > B. > > At > > >>>>> this time, although C hasn't received m, it's still in ISR. If A is > > >>>> killed, > > >>>>> C can be elected as the new leader, and consumers will miss m. > > >>>>> 3. acks=-1. Suppose replica.lag.max.messages=M. There are two > > >> sub-cases: > > >>>>> 3.1 M>0. Suppose C be killed. C will be out of ISR after > > >>>>> replica.lag.time.max.ms. Then the producer publishes M messages > to A > > >> and > > >>>>> B. C restarts. C will join in ISR since it is M messages behind A > and > > >> B. > > >>>>> Before C replicates all messages, A is killed, and C becomes > leader, > > >> then > > >>>>> message loss happens. > > >>>>> 3.2 M=0. In this case, when the producer publishes at a high > speed, B > > >> and > > >>>>> C will fail out of ISR, only A keeps receiving messages. Then A is > > >>>> killed. > > >>>>> Either message loss or service blocking will happen, depending on > > >> whether > > >>>>> unclean leader election is enabled. > > >>>>> > > >>>>> > > >>>>> From: users@kafka.apache.org At: Jul 21 2014 22:28:18 > > >>>>> To: JIANG WU (PRICEHISTORY) (BLOOMBERG/ 731 LEX -), > > >>>> users@kafka.apache.org > > >>>>> Subject: Re: how to ensure strong consistency with reasonable > > >>>> availability > > >>>>> > > >>>>> You will probably need 0.8.2 which gives > > >>>>> https://issues.apache.org/jira/browse/KAFKA-1028 > > >>>>> > > >>>>> > > >>>>> On Mon, Jul 21, 2014 at 6:37 PM, Jiang Wu (Pricehistory) > (BLOOMBERG/ > > >> 731 > > >>>>> LEX -) <jwu...@bloomberg.net> wrote: > > >>>>> > > >>>>>> Hi everyone, > > >>>>>> > > >>>>>> With a cluster of 3 brokers and a topic of 3 replicas, we want to > > >>>> achieve > > >>>>>> the following two properties: > > >>>>>> 1. when only one broker is down, there's no message loss, and > > >>>>>> procuders/consumers are not blocked. > > >>>>>> 2. in other more serious problems, for example, one broker is > > >> restarted > > >>>>>> twice in a short period or two brokers are down at the same time, > > >>>>>> producers/consumers can be blocked, but no message loss is > allowed. > > >>>>>> > > >>>>>> We haven't found any producer/broker paramter combinations that > > >> achieve > > >>>>>> this. If you know or think some configurations will work, please > > post > > >>>>>> details. We have a test bed to verify any given configurations. > > >>>>>> > > >>>>>> In addition, I'm wondering if it's necessary to open a jira to > > require > > >>>>> the > > >>>>>> above feature? > > >>>>>> > > >>>>>> Thanks, > > >>>>>> Jiang > > >> > > >