Re: metrics about how behind a replica is?

Xiaoyu Wang Thu, 18 Dec 2014 12:08:14 -0800

@Jun, We can increase the number of resends, but the produce request may
still fail.


For async producer, at the time when it fails, we have

   - messages that are in queue but has not been sent. From javaapi, we
   don't know which messages are still in queue.


   - Is it possible that we expose the blocking queue size so we know what
      remains in the queue?


   - messages we have failed retrying. For the last batch, some may have
   succeeded, but some failed retrying. From javaapi, we don't know what are
   the messages failed.
      - Is it possible to dump the failed messages to a file so that the
      next run can pick them up?

Does this make sense? Is there other way you will recommend to keep track
of messages that have been sent for async producer?

Thanks



On Wed, Dec 17, 2014 at 10:58 AM, Jun Rao <j...@confluent.io> wrote:
>
> You can configure the number of resends on the producer.
>
> Thanks,
>
> Jun
>
> On Wed, Dec 17, 2014 at 10:34 AM, Xiaoyu Wang <xw...@rocketfuel.com>
> wrote:
> >
> > I have tested using "async" producer with "required.ack=-1" and got
> really
> > good performance.
> >
> > We have not used async producer much previously, any potential dataloss
> > when a broker goes down? For example, when a broker goes down, does
> > producer resend all the messages in a batch?
> >
> >
> > On Wed, Dec 17, 2014 at 1:16 PM, Xiaoyu Wang <xw...@rocketfuel.com>
> wrote:
> > >
> > > Thanks Jun.
> > >
> > > We have tested our producer with the different required.ack config.
> Even
> > > with the required.ack=1, the producer is > 10 times slower than with
> > > required.ack=0. Does this confirm with your  testing?
> > >
> > > I saw the presentation of LinkedIn Kafka SRE. Wondering what
> > configuration
> > > you guys have at LinkedIn to guarantee zero data loss.
> > >
> > > Thanks again and really appreciate your help!
> > >
> > > On Tue, Dec 16, 2014 at 9:50 PM, Jun Rao <j...@confluent.io> wrote:
> > >>
> > >> replica.lag.max.messages only controls when a replica should be
> dropped
> > >> out
> > >> of the in-sync replica set (ISR). For a message to be considered
> > >> committed,
> > >> it has to be added to every replica in ISR. When the producer uses
> > ack=-1,
> > >> the broker waits until the produced message is committed before
> > >> acknowledging the client. So in the case of a clean leader election
> > (i.e.,
> > >> there is at least one remaining replica in ISR), no committed messages
> > are
> > >> lost. In the case of an unclean leader election, the number of
> messages
> > >> that can be lost depends on the state of the replicas and it's
> possible
> > to
> > >> lose more than replica.lag.max.messages messages.
> > >>
> > >> We do have the lag jmx metric per replica (see
> > >> http://kafka.apache.org/documentation.html#monitoring).
> > >>
> > >> Thanks,
> > >>
> > >> Jun
> > >>
> > >> On Sun, Dec 14, 2014 at 7:20 AM, Xiaoyu Wang <xw...@rocketfuel.com>
> > >> wrote:
> > >> >
> > >> > Hello,
> > >> >
> > >> > If I understand it correctly, when the number of messages a replica
> is
> > >> > behind from the leader is < replica.lag.max.messages, the replica is
> > >> > considered in sync with the master and are eligible for leader
> > election.
> > >> >
> > >> > This means we can lose at most replica.lag.max.messages messages
> > during
> > >> > leader election, is it? We can set the replica.lag.max.messages to
> be
> > >> very
> > >> > low, but then we may result in unclean leader election, so still we
> > can
> > >> > lose data.
> > >> >
> > >> > Can you recommend some way to prevent data loss? We have tried
> setting
> > >> > require ack from all replicas, but that slows down producer
> > >> significantly.
> > >> >
> > >> > In addition, do we have metrics about how far each replica is
> behind?
> > If
> > >> > not, can we add them.
> > >> >
> > >> >
> > >> > Thanks,
> > >> >
> > >>
> > >
> >
>

Re: metrics about how behind a replica is?

Reply via email to