Hello Jiang,

If some of your brokers have frequent long GC caused soft failures you can
take a look at the operational wiki page to see if that helps tuning GC
settings.

https://cwiki.apache.org/confluence/display/KAFKA/Operations

Guozhang


On Tue, Jul 22, 2014 at 9:42 AM, Scott Clasen <sc...@heroku.com> wrote:

> Ahh, yes that message loss case. I've wondered about that myself.
>
> I guess I dont really understand why truncating messages is ever the right
> thing to do.  As kafka is an 'at least once' system. (send a message, get
> no ack, it still might be on the topic) consumers that care will have to
> de-dupe anyhow.
>
> To the kafka designers:  is there anything preventing implementation of
> alternatives to truncation? when a broker comes back online and needs to
> truncate, cant it fire up a producer and take the extra messages and send
> them back to the original topic or alternatively an error topic?
>
> Would love to understand the rationale for the current design, as my
> perspective is doubtfully as clear as the designers'
>
>
>
>
> On Tue, Jul 22, 2014 at 6:21 AM, Jiang Wu (Pricehistory) (BLOOMBERG/ 731
> LEX -) <jwu...@bloomberg.net> wrote:
>
> > kafka-1028 addressed another unclean leader election problem. It prevents
> > a broker not in ISR from becoming a leader. The problem we are facing is
> > that a broker in ISR but without complete messages may become a leader.
> > It's also a kind of unclean leader election, but not the one that
> > kafka-1028 addressed.
> >
> > Here I'm trying to give a proof that current kafka doesn't achieve the
> > requirement (no message loss, no blocking when 1 broker down) due to its
> > two behaviors:
> > 1. when choosing a new leader from 2 followers in ISR, the one with less
> > messages may be chosen as the leader
> > 2. even when replica.lag.max.messages=0, a follower can stay in ISR when
> > it has less messages than the leader.
> >
> > We consider a cluster with 3 brokers and a topic with 3 replicas. We
> > analyze different cases according to the value of request.required.acks
> > (acks for short). For each case and it subcases, we find situations that
> > either message loss or service blocking happens. We assume that at the
> > beginning, all 3 replicas, leader A, followers B and C, are in sync,
> i.e.,
> > they have the same messages and are all in ISR.
> >
> > 1. acks=0, 1, 3. Obviously these settings do not satisfy the requirement.
> > 2. acks=2. Producer sends a message m. It's acknowledged by A and B. At
> > this time, although C hasn't received m, it's still in ISR. If A is
> killed,
> > C can be elected as the new leader, and consumers will miss m.
> > 3. acks=-1. Suppose replica.lag.max.messages=M. There are two sub-cases:
> > 3.1 M>0. Suppose C be killed. C will be out of ISR after
> > replica.lag.time.max.ms. Then the producer publishes M messages to A and
> > B. C restarts. C will join in ISR since it is M messages behind A and B.
> > Before C replicates all messages, A is killed, and C becomes leader, then
> > message loss happens.
> > 3.2 M=0. In this case, when the producer publishes at a high speed, B and
> > C will fail out of ISR, only A keeps receiving messages. Then A is
> killed.
> > Either message loss or service blocking will happen, depending on whether
> > unclean leader election is enabled.
> >
> >
> > From: users@kafka.apache.org At: Jul 21 2014 22:28:18
> > To: JIANG WU (PRICEHISTORY) (BLOOMBERG/ 731 LEX -),
> users@kafka.apache.org
> > Subject: Re: how to ensure strong consistency with reasonable
> availability
> >
> > You will probably need 0.8.2  which gives
> > https://issues.apache.org/jira/browse/KAFKA-1028
> >
> >
> > On Mon, Jul 21, 2014 at 6:37 PM, Jiang Wu (Pricehistory) (BLOOMBERG/ 731
> > LEX -) <jwu...@bloomberg.net> wrote:
> >
> > > Hi everyone,
> > >
> > > With a cluster of 3 brokers and a topic of 3 replicas, we want to
> achieve
> > > the following two properties:
> > > 1. when only one broker is down, there's no message loss, and
> > > procuders/consumers are not blocked.
> > > 2. in other more serious problems, for example, one broker is restarted
> > > twice in a short period or two brokers are down at the same time,
> > > producers/consumers can be blocked, but no message loss is allowed.
> > >
> > > We haven't found any producer/broker paramter combinations that achieve
> > > this. If you know or think some configurations will work, please post
> > > details. We have a test bed to verify any given configurations.
> > >
> > > In addition, I'm wondering if it's necessary to open a jira to require
> > the
> > > above feature?
> > >
> > > Thanks,
> > > Jiang
> >
> >
> >
>



-- 
-- Guozhang

Reply via email to