Thanks for the followup Becket. It sounds we are on agreement on the scope of this KIP, and the discussion has definitely clarified a lot of the subtle points.
Apurva On Tue, Aug 15, 2017 at 10:49 PM, Becket Qin <becket....@gmail.com> wrote: > Hi Apurva, > > Thanks for the clarification of the definition. The definitions are clear > and helpful. > > It seems the scope of this KIP is just about the producer side > configuration change, but not attempting to achieve the exactly once > semantic with all default settings out of the box. The broker still needs > to be configured appropriately to achieve the exactly once semantic. If so, > the current proposal sounds reasonable to me. Apologies if I misunderstood > the goal of this KIP. > > Regarding the max.in.flight.requests.per.connection, I don't think we have > to support infinite number of in flight requests. But admittedly there are > use cases that people would want to have reasonably high in flight > requests. Given that we need to make code changes to support idempotence > and in.flight.request > 1, it would be nice to see if we can cover those > use cases instead of doing that later. We can discuss this in a separate > thread. > > Thanks, > > Jiangjie (Becket) Qin > > > On Tue, Aug 15, 2017 at 1:46 PM, Guozhang Wang <wangg...@gmail.com> wrote: > > > Hi Jay, > > > > I chatted with Apurva offline, and we think the key of the discussion is > > that, as summarized in the updated KIP wiki, whether we should consider > > replication as a necessary condition of at-least-once, and of course also > > exactly-once. Originally I think replication is not a necessary condition > > for at-least-once, since the scope of failures that we should be covering > > is different in my definition; if we claim that "even for at-least-once, > > you should have replication factor larger than 2, let alone exactly-once" > > then I agree that having acks=all on the client side should also be a > > necessary condition for at-least-once, and for exactly-once as well. Then > > this KIP would be just providing what is necessary but not sufficient > > conditions, from client-side configs to achieve EOS, while you also need > > the broker-side configs together to really support it. > > > > Guozhang > > > > > > On Tue, Aug 15, 2017 at 1:15 PM, Jay Kreps <j...@confluent.io> wrote: > > > > > Hey Guozhang, > > > > > > I think the argument is that with acks=1 the message could be lost and > > > hence you aren't guaranteeing exactly once delivery. > > > > > > -Jay > > > > > > On Mon, Aug 14, 2017 at 1:36 PM, Guozhang Wang <wangg...@gmail.com> > > wrote: > > > > > > > Just want to clarify that regarding 1), I'm fine with changing it to > > > `all` > > > > but just wanted to argue it is not necessarily correlate with the > > > > exactly-once semantics, but rather on persistence v.s. availability > > > > trade-offs, so I'd like to discuss them separately. > > > > > > > > Regarding 2), one minor concern I had is that the enforcement is on > the > > > > client side while the parts it affects is on the broker side. I.e. > the > > > > broker code would assume at most 5 in.flight when idempotent is > turned > > > on, > > > > but this is not enforced at the broker but relying at the client > side's > > > > sanity. So other implementations of the client that may not obey this > > may > > > > likely break the broker code. If we do enforce this we'd better > enforce > > > it > > > > at the broker side. Also, I'm wondering if we have considered the > > > approach > > > > for brokers to read the logs in order to get the starting offset when > > it > > > > does not about it in its snapshot, that whether it is worthwhile if > we > > > > assume that such issues are very rare to happen? > > > > > > > > > > > > Guozhang > > > > > > > > > > > > > > > > On Mon, Aug 14, 2017 at 11:01 AM, Apurva Mehta <apu...@confluent.io> > > > > wrote: > > > > > > > > > Hello, > > > > > > > > > > I just want to summarize where we are in this discussion > > > > > > > > > > There are two major points of contention: should we have acks=1 or > > > > acsk=all > > > > > by default? and how to cap max.in.flight.requests.per.connection? > > > > > > > > > > 1) acks=1 vs acks=all1 > > > > > > > > > > Here are the tradeoffs of each: > > > > > > > > > > If you have replication-factor=N, your data is resilient N-1 to > disk > > > > > failures. For N>1, here is the tradeoff between acks=1 and > acks=all. > > > > > > > > > > With proposed defaults and acks=all, the stock Kafka producer and > the > > > > > default broker settings would guarantee that ack'd messages would > be > > in > > > > the > > > > > log exactly once. > > > > > > > > > > With the proposed defaults and acks=1, the stock Kafka producer and > > the > > > > > default broker settings would guarantee that 'retained ack'd > messages > > > > would > > > > > be in the log exactly once. But all ack'd messages may not be > > > retained'. > > > > > > > > > > If you leave replication-factor=1, acks=1 and acks=all have > identical > > > > > semantics and performance, but you are resilient to 0 disk > failures. > > > > > > > > > > I think the measured cost (again the performance details are in the > > > wiki) > > > > > of acks=all is well worth the much clearer semantics. What does the > > > rest > > > > of > > > > > the community think? > > > > > > > > > > 2) capping max.in.flight at 5 when idempotence is enabled. > > > > > > > > > > We need to limit the max.in.flight for the broker to de-duplicate > > > > messages > > > > > properly. The limitation would only apply when idempotence is > > enabled. > > > > The > > > > > shared numbers show that when the client-broker latency is low, > there > > > is > > > > no > > > > > performance gain for max.inflight > 2. > > > > > > > > > > Further, it is highly debatable that max.in.flight=500 is > > significantly > > > > > better than max.in.flight=5 for a really high latency > client-broker > > > > link, > > > > > and so far there are no hard numbers one way or another. However, > > > > assuming > > > > > that max.in.flight=500 is significantly better than max.inflight=5 > in > > > > some > > > > > niche use case, the user would have to sacrifice idempotence for > > > > > throughput. In this extreme corner case, I think it is an > acceptable > > > > > tradeoff. > > > > > > > > > > What does the community think? > > > > > > > > > > Thanks, > > > > > Apurva > > > > > > > > > > > > > > > > > > > > > -- > > > > -- Guozhang > > > > > > > > > > > > > > > -- > > -- Guozhang > > >