Hi Jay,

I chatted with Apurva offline, and we think the key of the discussion is
that, as summarized in the updated KIP wiki, whether we should consider
replication as a necessary condition of at-least-once, and of course also
exactly-once. Originally I think replication is not a necessary condition
for at-least-once, since the scope of failures that we should be covering
is different in my definition; if we claim that "even for at-least-once,
you should have replication factor larger than 2, let alone exactly-once"
then I agree that having acks=all on the client side should also be a
necessary condition for at-least-once, and for exactly-once as well. Then
this KIP would be just providing what is necessary but not sufficient
conditions, from client-side configs to achieve EOS, while you also need
the broker-side configs together to really support it.

Guozhang


On Tue, Aug 15, 2017 at 1:15 PM, Jay Kreps <j...@confluent.io> wrote:

> Hey Guozhang,
>
> I think the argument is that with acks=1 the message could be lost and
> hence you aren't guaranteeing exactly once delivery.
>
> -Jay
>
> On Mon, Aug 14, 2017 at 1:36 PM, Guozhang Wang <wangg...@gmail.com> wrote:
>
> > Just want to clarify that regarding 1), I'm fine with changing it to
> `all`
> > but just wanted to argue it is not necessarily correlate with the
> > exactly-once semantics, but rather on persistence v.s. availability
> > trade-offs, so I'd like to discuss them separately.
> >
> > Regarding 2), one minor concern I had is that the enforcement is on the
> > client side while the parts it affects is on the broker side. I.e. the
> > broker code would assume at most 5 in.flight when idempotent is turned
> on,
> > but this is not enforced at the broker but relying at the client side's
> > sanity. So other implementations of the client that may not obey this may
> > likely break the broker code. If we do enforce this we'd better enforce
> it
> > at the broker side. Also, I'm wondering if we have considered the
> approach
> > for brokers to read the logs in order to get the starting offset when it
> > does not about it in its snapshot, that whether it is worthwhile if we
> > assume that such issues are very rare to happen?
> >
> >
> > Guozhang
> >
> >
> >
> > On Mon, Aug 14, 2017 at 11:01 AM, Apurva Mehta <apu...@confluent.io>
> > wrote:
> >
> > > Hello,
> > >
> > > I just want to summarize where we are in this discussion
> > >
> > > There are two major points of contention: should we have acks=1 or
> > acsk=all
> > > by default? and how to cap max.in.flight.requests.per.connection?
> > >
> > > 1) acks=1 vs acks=all1
> > >
> > > Here are the tradeoffs of each:
> > >
> > > If you have replication-factor=N, your data is resilient N-1 to disk
> > > failures. For N>1, here is the tradeoff between acks=1 and acks=all.
> > >
> > > With proposed defaults and acks=all, the stock Kafka producer and the
> > > default broker settings would guarantee that ack'd messages would be in
> > the
> > > log exactly once.
> > >
> > > With the proposed defaults and acks=1, the stock Kafka producer and the
> > > default broker settings would guarantee that 'retained ack'd messages
> > would
> > > be in the log exactly once. But all ack'd messages may not be
> retained'.
> > >
> > > If you leave replication-factor=1, acks=1 and acks=all have identical
> > > semantics and performance, but you are resilient to 0 disk failures.
> > >
> > > I think the measured cost (again the performance details are in the
> wiki)
> > > of acks=all is well worth the much clearer semantics. What does the
> rest
> > of
> > > the community think?
> > >
> > > 2) capping max.in.flight at 5 when idempotence is enabled.
> > >
> > > We need to limit the max.in.flight for the broker to de-duplicate
> > messages
> > > properly. The limitation would only apply when idempotence is enabled.
> > The
> > > shared numbers show that when the client-broker latency is low, there
> is
> > no
> > > performance gain for max.inflight > 2.
> > >
> > > Further, it is highly debatable that max.in.flight=500 is significantly
> > > better than max.in.flight=5  for a really high latency client-broker
> > link,
> > > and so far there are no hard numbers one way or another. However,
> > assuming
> > > that max.in.flight=500 is significantly better than max.inflight=5 in
> > some
> > > niche use case, the user would have to sacrifice idempotence for
> > > throughput. In this extreme corner case, I think it is an acceptable
> > > tradeoff.
> > >
> > > What does the community think?
> > >
> > > Thanks,
> > > Apurva
> > >
> >
> >
> >
> > --
> > -- Guozhang
> >
>



-- 
-- Guozhang

Reply via email to