Re: Trying to understand design decision about producer ack and min.insync.replicas

James Cheng Fri, 24 Feb 2017 21:55:52 -0800

I read the recent Client Survey 
(https://www.confluent.io/blog/first-annual-state-apache-kafka-client-use-survey/
 
<https://www.confluent.io/blog/first-annual-state-apache-kafka-client-use-survey/>).
 It said that most responders to the survey said that reliability was critical 
or very important. And so given that, I was inspired to follow up on this 
thread.


Grant, Ewen, Ismael, and I all think that defaulting the producer to acks=all 
would be a good thing to do.

And Grant suggested a couple more. The producer suggestion in particular 
(block.on.buffer.full=true and max.in.flight.requests.per.connection=1) I 
believe would prevent silent data loss and prevent message reordering.

What do you all think is the next step? I imagine that the actual 
implementation of these wouldn't be the hard part (you'd just flip a default 
somewhere). The hard part would be the KIP discussions and the migration 
process and whatever backwards compatibility and messaging are required.

-James


> On Feb 3, 2017, at 8:01 AM, Grant Henke <ghe...@cloudera.com> wrote:
> 
> I would be in favor of defaulting acks=all.
> 
> I have found that most people want to start with the stronger/safer
> guarantees and then adjust them for performance on a case by case basis.
> This gives them a chance to understand and accept the tradeoffs.
> 
> A few other defaults I would be in favor of changing (some are harder and
> more controversial than others) are:
> 
> Broker:
> 
>   - zookeeper.chroot=kafka (was "")
>   - This will be easiest when direct communication to zookeeper isn't done
>      by clients
> 
> Producer:
> 
>   - block.on.buffer.full=true (was false)
>   - max.in.flight.requests.per.connection=1 (was 5)
> 
> All:
> 
>   - *receive.buffer.bytes=-1 (was 102400)
>   - *send.buffer.bytes=-1 (was 102400)
> 
> 
> 
> 
> On Fri, Feb 3, 2017 at 2:03 AM, Ismael Juma <ism...@juma.me.uk> wrote:
> 
>> I'd be in favour too.
>> 
>> Ismael
>> 
>> On 3 Feb 2017 7:33 am, "Ewen Cheslack-Postava" <e...@confluent.io> wrote:
>> 
>>> On Thu, Feb 2, 2017 at 11:21 PM, James Cheng <wushuja...@gmail.com>
>> wrote:
>>> 
>>>> Ewen,
>>>> 
>>>> Ah right, that's a good point.
>>>> 
>>>> My initial reaction to your examples was that "well, those should be in
>>>> separate topics", but then I realized that people choose their topics
>>> for a
>>>> variety of reasons. Sometimes they organize it based on their
>> producers,
>>>> sometimes they organize it based on the nature of the data, but
>> sometimes
>>>> (as you gave examples about), they may organize it based on the
>> consuming
>>>> application. And there are valid reason to want different data types
>> in a
>>>> single topic:
>>>> 
>>>> 1) You get global ordering
>>>> 2) You get persistent ordering in the case of re-reads (where as
>> reading
>>> 2
>>>> topics would cause different ordering upon re-reads.)
>>>> 3) Logically-related data types all co-located.
>>>> 
>>>> I do still think it'd be convenient to only have to set
>>>> min.insync.replicas on a topic and not have to require producing
>>>> applications to also set acks=all. It'd then be a single thing you have
>>> to
>>>> configure, instead of the current 2 things. (since, as currently
>>>> implemented, you have to set both things, in order to achieve high
>>>> durability.)
>>>> 
>>> 
>>> I entirely agree, I think the default should be acks=all and then this
>>> would be true :) Similar to the unclean leader election setting, I think
>>> defaulting to durable by default is a better choice. I understand
>>> historically why a different choice was made (Kafka didn't start out as a
>>> replicated, durable storage system), but given how it has evolved I think
>>> durable by default would be a better choice on both the broker and
>>> producer.
>>> 
>>> 
>>>> 
>>>> But I admit that it's hard to find the balance of features/simplicity/
>>> complexity,
>>>> to handle all the use cases.
>>>> 
>>> 
>>> Perhaps the KIP-106 adjustment to unclean leader election could benefit
>>> from a sister KIP for adjusting the default producer acks setting?
>>> 
>>> Not sure how popular it would be, but I would be in favor.
>>> 
>>> -Ewen
>>> 
>>> 
>>>> 
>>>> Thanks,
>>>> -James
>>>> 
>>>>> On Feb 2, 2017, at 9:42 PM, Ewen Cheslack-Postava <e...@confluent.io
>>> 
>>>> wrote:
>>>>> 
>>>>> James,
>>>>> 
>>>>> Great question, I probably should have been clearer. log data is an
>>>> example
>>>>> where the app (or even instance of the app) might know best what the
>>>> right
>>>>> tradeoff is. Depending on your strategy for managing logs, you may or
>>> may
>>>>> not be mixing multiple logs (and logs from different deployments)
>> into
>>>> the
>>>>> same topic. For example, if you key by application, then you have an
>>> easy
>>>>> way to split logs up while still getting a global feed of log
>> messages.
>>>>> Maybe logs from one app are really critical and we want to retry, but
>>>> from
>>>>> another app are just a nice to have.
>>>>> 
>>>>> There are other examples even within a single app. For example, a
>>> gaming
>>>>> company might report data from a user of a game to the same topic but
>>>> want
>>>>> 2 producers with different reliability levels (and possibly where the
>>>>> ordering constraints across the two sets that might otherwise cause
>> you
>>>> to
>>>>> use a single consumer are not an issue). High frequency telemetry on
>> a
>>>>> player might be desirable to have, but not the end of the world if
>> some
>>>> is
>>>>> lost. In contrast, they may want a stronger guarantee for, e.g.,
>>> sometime
>>>>> like chat messages, where they want to have a permanent record of
>> them
>>> in
>>>>> all circumstances.
>>>>> 
>>>>> -Ewen
>>>>> 
>>>>> On Fri, Jan 27, 2017 at 12:59 AM, James Cheng <wushuja...@gmail.com>
>>>> wrote:
>>>>> 
>>>>>> 
>>>>>>> On Jan 27, 2017, at 12:18 AM, Ewen Cheslack-Postava <
>>> e...@confluent.io
>>>>> 
>>>>>> wrote:
>>>>>>> 
>>>>>>> On Thu, Jan 26, 2017 at 4:23 PM, Luciano Afranllie <
>>>>>> listas.luaf...@gmail.com
>>>>>>>> wrote:
>>>>>>> 
>>>>>>>> I was thinking about the situation where you have less brokers in
>>> the
>>>>>> ISR
>>>>>>>> list than the number set in min.insync.replicas.
>>>>>>>> 
>>>>>>>> My idea was that if I, as an administrator, for a given topic,
>> want
>>> to
>>>>>>>> favor durability over availability, then if that topic has less
>> ISR
>>>> than
>>>>>>>> the value set in min.insync.replicas I may want to stop producing
>> to
>>>> the
>>>>>>>> topic. In the way min.insync.replicas and ack work, I need to
>>>> coordinate
>>>>>>>> with all producers in order to achieve this. There is no way (or I
>>>> don't
>>>>>>>> know it) to globally enforce stop producing to a topic if it is
>>> under
>>>>>>>> replicated.
>>>>>>>> 
>>>>>>>> I don't see why, for the same topic, some producers might want get
>>> an
>>>>>> error
>>>>>>>> when the number of ISR is below min.insync.replicas while other
>>>>>> producers
>>>>>>>> don't. I think it could be more useful to be able to set that ALL
>>>>>> producers
>>>>>>>> should get an error when a given topic is under replicated so they
>>>> stop
>>>>>>>> producing, than for a single producer to get an error when ANY
>> topic
>>>> is
>>>>>>>> under replicated. I don't have a lot of experience with Kafka so I
>>> may
>>>>>> be
>>>>>>>> missing some use cases.
>>>>>>>> 
>>>>>>> 
>>>>>>> It's also a matter of not having to do a ton of configuration on a
>>>>>>> per-topic basis. Putting some control in the producer apps hands
>>> means
>>>>>> you
>>>>>>> can set reasonably global defaults which make sense for apps that
>>>> require
>>>>>>> stronger durability while letting cases that have lower
>> requirements
>>>>>> still
>>>>>>> benefit from the durability before consumers see data but not block
>>>>>>> producers because the producer chooses lower requirements. WIthout
>>>>>>> requiring the ability to make config changes on the Kafka brokers
>>>> (which
>>>>>>> may be locked down and restricted only to Kafka admins), the
>> producer
>>>>>>> application can choose to accept weaker guarantees based on the
>>>> tradeoffs
>>>>>>> it needs to make.
>>>>>>> 
>>>>>> 
>>>>>> I'm not sure I follow, Ewen.
>>>>>> 
>>>>>> I do agree that if I set min.insync.replicas at a broker level, then
>>> of
>>>>>> course I would like individual producers to decide whether their
>> topic
>>>>>> (which inherits from the global setting) should reject writes if
>> that
>>>> topic
>>>>>> has size(ISR)<min.insync.replicas.
>>>>>> 
>>>>>> But on a topic-level... are you saying that if a particular topic
>> has
>>>>>> min.insync.replicas set, that you want producers to have the
>>>> flexibility to
>>>>>> decide on whether they want durability vs availability?
>>>>>> 
>>>>>> Often times (but not always), a particular topic is used only by a
>>> small
>>>>>> set of producers with a specific set of data. The durability
>> settings
>>>> would
>>>>>> usually be chosen due to the nature of the data, rather than based
>> on
>>>> who
>>>>>> produced the data, and so it makes sense to me that the durability
>>>> should
>>>>>> be on the entire topic, not by the producer.
>>>>>> 
>>>>>> What is a use case where you have multiple producers writing to the
>>> same
>>>>>> topic but would want different durability?
>>>>>> 
>>>>>> -James
>>>>>> 
>>>>>>> The ability to make this tradeoff in different places can seem more
>>>>>> complex
>>>>>>> (and really by definition *is* more complex), but it also offers
>> more
>>>>>>> flexibility.
>>>>>>> 
>>>>>>> -Ewen
>>>>>>> 
>>>>>>> 
>>>>>>>> But I understand your point, min.insync.replicas setting should be
>>>>>>>> understood as "if a producer wants to get an error when topics are
>>>> under
>>>>>>>> replicated, then how many replicas are enough for not raising an
>>>> error?"
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Thu, Jan 26, 2017 at 4:16 PM, Ewen Cheslack-Postava <
>>>>>> e...@confluent.io>
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> The acks setting for the producer doesn't affect the final
>>> durability
>>>>>>>>> guarantees. These are still enforced by the replication and min
>> ISR
>>>>>>>>> settings. Instead, the ack setting just lets the producer control
>>> how
>>>>>>>>> durable the write is before *that producer* can consider the
>> write
>>>>>>>>> "complete", i.e. before it gets an ack.
>>>>>>>>> 
>>>>>>>>> -Ewen
>>>>>>>>> 
>>>>>>>>> On Tue, Jan 24, 2017 at 12:46 PM, Luciano Afranllie <
>>>>>>>>> listas.luaf...@gmail.com> wrote:
>>>>>>>>> 
>>>>>>>>>> Hi everybody
>>>>>>>>>> 
>>>>>>>>>> I am trying to understand why Kafka let each individual
>> producer,
>>>> on a
>>>>>>>>>> connection per connection basis, choose the tradeoff between
>>>>>>>> availability
>>>>>>>>>> and durability, honoring min.insync.replicas value only if
>>> producer
>>>>>>>> uses
>>>>>>>>>> ack=all.
>>>>>>>>>> 
>>>>>>>>>> I mean, for a single topic, cluster administrators can't enforce
>>>>>>>> messages
>>>>>>>>>> to be stores in a minimum number of replicas without
>> coordinating
>>>> with
>>>>>>>>> all
>>>>>>>>>> producers to that topic so all of them use ack=all.
>>>>>>>>>> 
>>>>>>>>>> Is there something that I am missing? Is there any other
>> strategy
>>> to
>>>>>>>>>> overcome this situation?
>>>>>>>>>> 
>>>>>>>>>> Regards
>>>>>>>>>> Luciano
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>>> 
>>>> 
>>>> 
>>> 
>> 
> 
> 
> 
> -- 
> Grant Henke
> Software Engineer | Cloudera
> gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke

Re: Trying to understand design decision about producer ack and min.insync.replicas

Reply via email to