Re: how to ensure strong consistency with reasonable availability

Jun Rao Thu, 24 Jul 2014 09:39:22 -0700

About re-publishing m2. it seems it's better to let the producer choose
whether to do this or not.


There is another known bug KAFKA-1211 that's not fixed yet. The situation
when this can happen is relatively rare and the fix is slightly involved.
So, it may not be addressed in 0.8.2.

Thanks,

Jun


On Tue, Jul 22, 2014 at 8:36 PM, scott@heroku <sc...@heroku.com> wrote:

> Thanks so much for the detailed explanation Jun, it pretty much lines up
> with my understanding.
>
> In the case below, if we didn't particularly care about ordering and
> re-produced m2, it would then become m5, and in many use cases this would
> be ok?
>
> Perhaps a more direct question would be, once 0.8.2 is out and I have a
> topic with unclean leader election disabled, and produce with acks = -1,
>  are there any known series of events (other than disk failures on all
> brokers) that would cause the loss of messages that a producer has received
> an ack for?
>
>
>
>
>
> Sent from my iPhone
>
> > On Jul 22, 2014, at 8:17 PM, Jun Rao <jun...@gmail.com> wrote:
> >
> > They key point is that we have to keep all replicas consistent with each
> > other such that no matter which replica a consumer reads from, it always
> > reads the same data on a given offset.  The following is an example.
> >
> > Suppose we have 3 brokers A, B and C. Let's assume A is the leader and at
> > some point, we have the following offsets and messages in each replica.
> >
> > offset   A   B   C
> > 1        m1  m1  m1
> > 2        m2
> >
> > Let's assume that message m1 is committed and message m2 is not. At
> exactly
> > this moment, replica A dies. After a new leader is elected, say B,  new
> > messages can be committed with just replica B and C. Some point later if
> we
> > commit two more messages m3 and m4, we will have the following.
> >
> > offset   A   B   C
> > 1        m1  m1  m1
> > 2        m2  m3  m3
> > 3            m4  m4
> >
> > Now A comes back. For consistency, it's important for A's log to be
> > identical to B and C. So, we have to remove m2 from A's log and add m3
> and
> > m4. As you can see, whether you want to republish m2 or not, m2 cannot
> stay
> > in its current offset, since in other replicas, that offset is already
> > taken by other messages. Therefore, a truncation of replica A's log is
> > needed to keep the replicas consistent. Currently, we don republish
> > messages like m2 since (1) it's not necessary since it's never considered
> > committed; (2) it will make our protocol more complicated.
> >
> > Thanks,
> >
> > Jun
> >
> >
> >
> >
> >> On Tue, Jul 22, 2014 at 3:40 PM, scott@heroku <sc...@heroku.com> wrote:
> >>
> >> Thanks Jun
> >>
> >> Can you explain a little more about what an uncommitted message means?
> >> The messages are in the log so presumably? they have been acked at least
> >> by the the local broker.
> >>
> >> I guess I am hoping for some intuition around why 'replaying' the
> messages
> >> in question would cause bad things.
> >>
> >> Thanks!
> >>
> >>
> >> Sent from my iPhone
> >>> On Jul 22, 2014, at 3:06 PM, Jun Rao <jun...@gmail.com> wrote:
> >>>
> >>> Scott,
> >>>
> >>> The reason for truncation is that the broker that comes back may have
> >> some
> >>> un-committed messages. Those messages shouldn't be exposed to the
> >> consumer
> >>> and therefore need to be removed from the log. So, on broker startup,
> we
> >>> first truncate the log to a safe point before which we know all
> messages
> >>> are committed. This broker will then sync up with the current leader to
> >> get
> >>> the remaining messages.
> >>>
> >>> Thanks,
> >>>
> >>> Jun
> >>>
> >>>
> >>>> On Tue, Jul 22, 2014 at 9:42 AM, Scott Clasen <sc...@heroku.com>
> wrote:
> >>>>
> >>>> Ahh, yes that message loss case. I've wondered about that myself.
> >>>>
> >>>> I guess I dont really understand why truncating messages is ever the
> >> right
> >>>> thing to do.  As kafka is an 'at least once' system. (send a message,
> >> get
> >>>> no ack, it still might be on the topic) consumers that care will have
> to
> >>>> de-dupe anyhow.
> >>>>
> >>>> To the kafka designers:  is there anything preventing implementation
> of
> >>>> alternatives to truncation? when a broker comes back online and needs
> to
> >>>> truncate, cant it fire up a producer and take the extra messages and
> >> send
> >>>> them back to the original topic or alternatively an error topic?
> >>>>
> >>>> Would love to understand the rationale for the current design, as my
> >>>> perspective is doubtfully as clear as the designers'
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On Tue, Jul 22, 2014 at 6:21 AM, Jiang Wu (Pricehistory) (BLOOMBERG/
> 731
> >>>> LEX -) <jwu...@bloomberg.net> wrote:
> >>>>
> >>>>> kafka-1028 addressed another unclean leader election problem. It
> >> prevents
> >>>>> a broker not in ISR from becoming a leader. The problem we are facing
> >> is
> >>>>> that a broker in ISR but without complete messages may become a
> leader.
> >>>>> It's also a kind of unclean leader election, but not the one that
> >>>>> kafka-1028 addressed.
> >>>>>
> >>>>> Here I'm trying to give a proof that current kafka doesn't achieve
> the
> >>>>> requirement (no message loss, no blocking when 1 broker down) due to
> >> its
> >>>>> two behaviors:
> >>>>> 1. when choosing a new leader from 2 followers in ISR, the one with
> >> less
> >>>>> messages may be chosen as the leader
> >>>>> 2. even when replica.lag.max.messages=0, a follower can stay in ISR
> >> when
> >>>>> it has less messages than the leader.
> >>>>>
> >>>>> We consider a cluster with 3 brokers and a topic with 3 replicas. We
> >>>>> analyze different cases according to the value of
> request.required.acks
> >>>>> (acks for short). For each case and it subcases, we find situations
> >> that
> >>>>> either message loss or service blocking happens. We assume that at
> the
> >>>>> beginning, all 3 replicas, leader A, followers B and C, are in sync,
> >>>> i.e.,
> >>>>> they have the same messages and are all in ISR.
> >>>>>
> >>>>> 1. acks=0, 1, 3. Obviously these settings do not satisfy the
> >> requirement.
> >>>>> 2. acks=2. Producer sends a message m. It's acknowledged by A and B.
> At
> >>>>> this time, although C hasn't received m, it's still in ISR. If A is
> >>>> killed,
> >>>>> C can be elected as the new leader, and consumers will miss m.
> >>>>> 3. acks=-1. Suppose replica.lag.max.messages=M. There are two
> >> sub-cases:
> >>>>> 3.1 M>0. Suppose C be killed. C will be out of ISR after
> >>>>> replica.lag.time.max.ms. Then the producer publishes M messages to A
> >> and
> >>>>> B. C restarts. C will join in ISR since it is M messages behind A and
> >> B.
> >>>>> Before C replicates all messages, A is killed, and C becomes leader,
> >> then
> >>>>> message loss happens.
> >>>>> 3.2 M=0. In this case, when the producer publishes at a high speed, B
> >> and
> >>>>> C will fail out of ISR, only A keeps receiving messages. Then A is
> >>>> killed.
> >>>>> Either message loss or service blocking will happen, depending on
> >> whether
> >>>>> unclean leader election is enabled.
> >>>>>
> >>>>>
> >>>>> From: users@kafka.apache.org At: Jul 21 2014 22:28:18
> >>>>> To: JIANG WU (PRICEHISTORY) (BLOOMBERG/ 731 LEX -),
> >>>> users@kafka.apache.org
> >>>>> Subject: Re: how to ensure strong consistency with reasonable
> >>>> availability
> >>>>>
> >>>>> You will probably need 0.8.2  which gives
> >>>>> https://issues.apache.org/jira/browse/KAFKA-1028
> >>>>>
> >>>>>
> >>>>> On Mon, Jul 21, 2014 at 6:37 PM, Jiang Wu (Pricehistory) (BLOOMBERG/
> >> 731
> >>>>> LEX -) <jwu...@bloomberg.net> wrote:
> >>>>>
> >>>>>> Hi everyone,
> >>>>>>
> >>>>>> With a cluster of 3 brokers and a topic of 3 replicas, we want to
> >>>> achieve
> >>>>>> the following two properties:
> >>>>>> 1. when only one broker is down, there's no message loss, and
> >>>>>> procuders/consumers are not blocked.
> >>>>>> 2. in other more serious problems, for example, one broker is
> >> restarted
> >>>>>> twice in a short period or two brokers are down at the same time,
> >>>>>> producers/consumers can be blocked, but no message loss is allowed.
> >>>>>>
> >>>>>> We haven't found any producer/broker paramter combinations that
> >> achieve
> >>>>>> this. If you know or think some configurations will work, please
> post
> >>>>>> details. We have a test bed to verify any given configurations.
> >>>>>>
> >>>>>> In addition, I'm wondering if it's necessary to open a jira to
> require
> >>>>> the
> >>>>>> above feature?
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Jiang
> >>
>

Re: how to ensure strong consistency with reasonable availability

Reply via email to