Makes sense in terms of priorities. Thanks, Apurva. On Thu, Aug 31, 2017 at 11:15 AM, Apurva Mehta <apu...@confluent.io> wrote:
> Thanks for the message, Roger. > > I think having 'acks=all' imply 'acks=minIsr' will probably result in some > improvement in the latency. However, I would note two things: > > 1. The numbers on the wiki are latency at max throughput, which should not > be representative of actual latency degradation. We should aim for more > representative numbers before trying to solve for the problem. I intend to > do this. > 2. We have some toggles to reduce replication latency like > ReplicaFetchMinBytes, ReplicaFetchWaitMaxMs. We can improve the replication > latency by tuning these variables. > 3. Engineers from Uber have played with the settings above and noticed that > our protocol is too inefficient to support frequent replica fetches. We > plan to improve this in the coming months. > > If we solve the low hanging fruit in (3), we can probably have lower > latency configs in (2), which would mean that the actual impact of acks=all > reduces. In this context, acks=minIsr will probably show a smaller impact > on reducing replication latency. But it would definitely be interesting to > get those numbers and see whether the theory bears out in practice. > > Regards, > Apurva > > On Thu, Aug 31, 2017 at 8:56 AM, Roger Hoover <roger.hoo...@gmail.com> > wrote: > > > Sorry, my math was sloppy. It's not twice as many requests taking > longer. > > If the probability of replication latency longer than X is Px for both > > replicas then, > > > > acks=all will have probability of Px(2-Px) of replication lag longer > than X > > while > > acks=minIsr will be Px > > > > > > On Wed, Aug 30, 2017 at 5:18 PM, Roger Hoover <roger.hoo...@gmail.com> > > wrote: > > > > > Sorry if this is a bit out of left field but can't help wondering... > > > > > > One way to improve producer performance while still having good > > guarantees > > > would be to allow a setting between acks=1 and acks=all. We could > > > introduce "acks=minIsr". This is already the guarantee you get when > the > > > ISR set shrinks below your replication factor. Why not allow producers > > to > > > get notified when minIsr replication has been acheived even when the > ISR > > > set is full? > > > > > > For rep factor == 3 and min.in.sync.replicas == 2 and sizeOf(ISR) == 3: > > > * with acks=all, the remote time of each request will be max(lag of 2 > > > followers) whereas > > > * with acks=minIsr, the remote time of each request will be min(lag of > 2 > > > followers) > > > > > > Whatever your latency distribution is for replication, for any given > > > remote time (say 100 ms), twice as many requests take longer than that > > time > > > with acks=all vs acks=minIsr. > > > > > > Thoughts? > > > > > > Roger > > > > > > On Wed, Aug 30, 2017 at 4:52 PM, Apurva Mehta <apu...@confluent.io> > > wrote: > > > > > >> Hi Ted, int16 is sufficient. I forgot to specify initially. I have > > updated > > >> the KIP. > > >> > > >> Thanks for pointing it out! > > >> Apurva > > >> > > >> On Wed, Aug 30, 2017 at 4:43 PM, Ted Yu <yuzhih...@gmail.com> wrote: > > >> > > >> > For ProduceRequest v4, would int32 or int16 be enough for > > >> idempotenceLevel > > >> > ? > > >> > > > >> > Cheers > > >> > > > >> > On Wed, Aug 30, 2017 at 3:47 PM, Apurva Mehta <apu...@confluent.io> > > >> wrote: > > >> > > > >> > > Thanks Ismael and Jason, I filed a separate KIP to solve the > > problems > > >> > > identified through this discussion. I also incorporated Jason's > > >> comments > > >> > in > > >> > > that document: > > >> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP- > > >> > > 192+%3A+Provide+cleaner+semantics+when+idempotence+is+enabled > > >> > > > > >> > > Please have a look, > > >> > > Apurva > > >> > > > > >> > > On Tue, Aug 29, 2017 at 3:28 AM, Ismael Juma <ism...@juma.me.uk> > > >> wrote: > > >> > > > > >> > > > Thanks for the proposals. I think they make sense and I also > agree > > >> with > > >> > > > Jason's suggestions. Also, it would be good to include the > updated > > >> > > > ProduceRequest/Response schema in the KIP. > > >> > > > > > >> > > > Ismael > > >> > > > > > >> > > > On Tue, Aug 22, 2017 at 11:42 PM, Jason Gustafson < > > >> ja...@confluent.io> > > >> > > > wrote: > > >> > > > > > >> > > > > Thanks Apurva, > > >> > > > > > > >> > > > > On compatibility: I think the proposal makes sense. It's a > pity > > >> that > > >> > we > > >> > > > > can't support idempotence for 0.11.0.0 brokers in the "safe" > > mode > > >> > even > > >> > > if > > >> > > > > it is supported by the broker. I can already imagine users > > >> > complaining > > >> > > > > about this, but I guess it's the consequence of missing the > > >> impact of > > >> > > > that > > >> > > > > validation check and not thinking through the ultimate goal of > > >> > enabling > > >> > > > > idempotence by default. A couple minor comments: > > >> > > > > > > >> > > > > 1. Instead of "safe," Ismael suggested "requested" as an > > >> alternative. > > >> > > > That > > >> > > > > seems to suggest more clearly that idempotence will only be > used > > >> when > > >> > > the > > >> > > > > broker supports it. > > >> > > > > 2. Should we deprecate the "true" and "false" options? It's a > > >> little > > >> > > > weird > > >> > > > > long term to support them in addition to the descriptive > names. > > >> > > > > > > >> > > > > On the OutOfOrderSequence proposal: high-level, the design > makes > > >> > > sense. A > > >> > > > > couple questions: > > >> > > > > > > >> > > > > 1. With this proposal, OutOfOrderSequence means that we must > > have > > >> a > > >> > > last > > >> > > > > produced offset. Is the idea to expose that in the > > >> > > > > OutOfOrderSequenceException so that users know which data was > > >> lost? > > >> > > > > 2. Previously we discussed duplicate handling. Currently we > > raise > > >> > > > > OutOfOrderSequence if we happen to get a sequence number which > > is > > >> > > earlier > > >> > > > > than the sequence numbers we have cached. Alternatively, you > > >> > suggested > > >> > > we > > >> > > > > can return a separate DuplicateError for this case, which > > clients > > >> can > > >> > > > > ignore if they do not care about the offset. I think it might > > make > > >> > > sense > > >> > > > to > > >> > > > > include that here so that the OutOfOrderSequence error is > > >> > unambiguous. > > >> > > > > > > >> > > > > Finally, do you plan to roll these proposals into the current > > KIP > > >> or > > >> > do > > >> > > > > them separately? Probably makes sense to combine them since > they > > >> both > > >> > > > > require a bump to the ProduceRequest. > > >> > > > > > > >> > > > > Thanks, > > >> > > > > Jason > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > On Fri, Aug 18, 2017 at 5:18 PM, Apurva Mehta < > > >> apu...@confluent.io> > > >> > > > wrote: > > >> > > > > > > >> > > > > > Thanks Jason and Ismael. > > >> > > > > > > > >> > > > > > The message format problem is an acute one: if we enable > > >> > idempotence > > >> > > by > > >> > > > > > default, the UnsupportedVersionException when writing to > > topics > > >> > with > > >> > > > the > > >> > > > > > older message format would mean that our prescribed upgrade > > >> steps > > >> > > would > > >> > > > > not > > >> > > > > > work. I have detailed the problems and the solutions on this > > >> page > > >> > > > (linked > > >> > > > > > to from the wiki): > > >> > > > > > > > >> > > > > > https://cwiki.apache.org/confluence/display/KAFKA/ > > >> > > > > > Kafka+Exactly+Once+-+Dealing+with+older+message+formats+ > > >> > > > > > when+idempotence+is+enabled > > >> > > > > > > > >> > > > > > It is worth discussing the solution to the problem proposed > > >> there. > > >> > If > > >> > > > it > > >> > > > > is > > >> > > > > > conceptually sound, it doesnt' seem too hard to implement. > > >> > > > > > > > >> > > > > > As far as the other problem of the spurious > OutOfOrderSequence > > >> > > > problem, I > > >> > > > > > have documented a proposed solution here: > > >> > > > > > > > >> > > > > > https://cwiki.apache.org/confluence/display/KAFKA/ > > >> > > > > > Kafka+Exactly+Once+-+Solving+the+problem+of+spurious+ > > >> > > > > > OutOfOrderSequence+errors > > >> > > > > > > > >> > > > > > This solution is a bit more involved in terms of effort. > > >> > > > > > > > >> > > > > > I think we cannot make the idempotent producer the default > > >> unless > > >> > we > > >> > > > > solve > > >> > > > > > the message format compatibility problem. I would also > prefer > > to > > >> > > solve > > >> > > > > the > > >> > > > > > second problem before making idempotence the default. > > >> > > > > > > > >> > > > > > I would be interested to hear everyone's thoughts on the two > > >> > > solutions > > >> > > > > > proposed above. > > >> > > > > > > > >> > > > > > Thanks, > > >> > > > > > Apurva > > >> > > > > > > > >> > > > > > On Fri, Aug 18, 2017 at 9:24 AM, Jason Gustafson < > > >> > ja...@confluent.io > > >> > > > > > >> > > > > > wrote: > > >> > > > > > > > >> > > > > > > > > > >> > > > > > > > so this change will break client backward compatibility > > >> while > > >> > > > > > connecting > > >> > > > > > > > to 0.10.X brokers. > > >> > > > > > > > users need to change producer default settings while > > >> > connecting > > >> > > > > older > > >> > > > > > > > brokers. > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > At the moment, I think the answer is yes. The old broker > > will > > >> not > > >> > > > > support > > >> > > > > > > the InitProducerId request, so the producer will > immediately > > >> > fail. > > >> > > > > > Similar > > >> > > > > > > to the handling of old message formats mentioned above, we > > >> > probably > > >> > > > > need > > >> > > > > > to > > >> > > > > > > change this so that we just revert to old producer > semantics > > >> if > > >> > the > > >> > > > > > broker > > >> > > > > > > can't support idempotence. > > >> > > > > > > > > >> > > > > > > -Jason > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > On Fri, Aug 18, 2017 at 8:48 AM, Manikumar < > > >> > > > manikumar.re...@gmail.com> > > >> > > > > > > wrote: > > >> > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > 3. The message format requirement is a good point. > This > > >> > should > > >> > > be > > >> > > > > > > > mentioned > > >> > > > > > > > > in the compatibility section. Users who are still > using > > >> the > > >> > old > > >> > > > > > message > > >> > > > > > > > > format will get an error after the upgrade, right? > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > so this change will break client backward compatibility > > >> while > > >> > > > > > connecting > > >> > > > > > > > to 0.10.X brokers. > > >> > > > > > > > users need to change producer default settings while > > >> > connecting > > >> > > > > older > > >> > > > > > > > brokers. > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > > > > > > >