Re: [DISCUSS]KIP-1107: Adding record-level acks for producers

TaiJu Wu Thu, 14 Nov 2024 21:12:59 -0800

Hi all,

I have updated the contents of this KIP
Please take a look and let me know what you think.


Thanks,
TaiJuWu

On Thu, Nov 14, 2024 at 2:21 PM TaiJu Wu <[email protected]> wrote:

> Hi all,
>
> Thanks for your feeback and @Chia-Ping's help.
> .
> I also agree topic-level acks config is more reasonable and it can simply
> the story.
> When I try implementing record-level acks, I notice I don't have good idea
> to avoid iterating batches for get partition information (need by
> *RecordAccumulator#partitionChanged*).
>
> Back to the init question how can I handle different acks for batches:
> First, we can attach *topic-level acks *to *RecordAccumulator#TopicInfo*.
> Second,  we can return *Map<Acks, List<ProducerBatch>>* when 
> *RecordAccumulator#drainBatchesForOneNode
> *is called. In this step, we can propagate acks to *sender*.
> Finally, we can get the acks info and group same acks into a
> *List<ProducerBatch>>* for a node in *sender#sendProduceRequests*.
>
> If I missed something or there is any mistake, please let me know.
> I will update this KIP later, thank your feedback.
>
> Best,
> TaiJuWu
>
>
> Chia-Ping Tsai <[email protected]> 於 2024年11月14日 週四 上午9:46寫道：
>
>> hi All
>>
>> This KIP is based on our use case where an edge application with many
>> sensors wants to use a single producer to deliver ‘few but varied’ records
>> with different acks settings. The reason for using a single producer is to
>> minimize resource usage on edge devices with limited hardware capabilities.
>> Currently, we use a producer pool to handle different acks values, which
>> requires 3x producer instances. Additionally, this approach creates many
>> idle producers if a sensor with a specific acks setting has no data for a
>> while.
>>
>> I love David’s suggestion since the acks configuration is closely related
>> to the topic. Maybe we can introduce an optional configuration in the
>> producer to define topic-level acks, with the existing acks being the
>> default for all topics. This approach is not only simple but also easy to
>> understand and implement.
>>
>> Best,
>> Chia-Ping
>>
>> On 2024/11/13 16:04:24 Andrew Schofield wrote:
>> > Hi TaiJuWu,
>> > I've been thinking for a while about this KIP before jumping into the
>> discussion.
>> >
>> > I'm afraid that I don't think the approach in the KIP is the best,
>> given the design
>> > of the Kafka protocol in this area. Essentially, each Produce request
>> contains
>> > the acks value at the top level, and may contain records for many
>> topics or
>> > partitions. My point is that batching occurs at the level of a Produce
>> request,
>> > so changing the acks value between records will require a new Produce
>> request
>> > to be sent. There would likely be an efficiency penalty if this feature
>> was used
>> > heavily with the acks changing record by record.
>> >
>> > I can see that potentially an application might want different ack
>> levels for
>> > different topics, but I would be surprised if they use different ack
>> levels within
>> > the same topic. Maybe David's suggestion of defining the acks per topic
>> > would be enough. What do you think?
>> >
>> > Thanks,
>> > Andrew
>> > ________________________________________
>> > From: David Jacot <[email protected]>
>> > Sent: 13 November 2024 15:31
>> > To: [email protected] <[email protected]>
>> > Subject: Re: [DISCUSS]KIP-1107: Adding record-level acks for producers
>> >
>> > Hi TaiJuWu,
>> >
>> > Thanks for the KIP.
>> >
>> > The motivation is not clear to me. Could you please elaborate a bit
>> more on
>> > it?
>> >
>> > My concern is that it adds a lot of complexity and the added value
>> seems to
>> > be low. Moreover, it will make reasoning about an application from the
>> > server side more difficult because we can no longer assume that it
>> writes
>> > with the ack based on the config. Another issue is about the batching,
>> how
>> > do you plan to handle batches mixing records with different acks?
>> >
>> > An alternative approach may be to define the ack per topic. We could
>> even
>> > think about defining it on the server side as a topic config. I haven't
>> > really thought about it but it may be something to explore a bit more.
>> >
>> > Best,
>> > David
>> >
>> > On Wed, Nov 13, 2024 at 3:56 PM Frédérik Rouleau
>> > <[email protected]> wrote:
>> >
>> > > Hi TaiJuWu,
>> > >
>> > > I find this adding lot's of complexity and I am still not convinced
>> by the
>> > > added value. IMO creating a producer instance per ack level is not
>> > > problematic and the behavior is clear for developers. What would be
>> the
>> > > added value of the proposed change ?
>> > >
>> > > Regards,
>> > >
>> > >
>> > > On Wed, Nov 6, 2024 at 7:50 AM TaiJu Wu <[email protected]> wrote:
>> > >
>> > > > Hi Fred and Greg,
>> > > >
>> > > > Thanks for your feedback and it really not straightforward but
>> > > interesting!
>> > > > There are some behavior I expect.
>> > > >
>> > > > The current producer uses the *RecordAccumulator* to gather
>> records, and
>> > > > the sender thread sends them in batches. We can track each record’s
>> > > > acknowledgment setting as it appends to the *RecordAccumulator*,
>> allowing
>> > > > the *sender *to group batches by acknowledgment levels and
>> topicPartition
>> > > > when processing.
>> > > >
>> > > > Regarding the statement, "Callbacks for records being sent to the
>> same
>> > > > partition are guaranteed to execute in order," this is ensured when
>> > > > *max.inflight.request
>> > > > *is set to 1. We can send records with different acknowledgment
>> levels in
>> > > > the order of acks-0, acks=1, acks=-1. Since we need to send batches
>> with
>> > > > different acknowledgment levels batches to the broker, the callback
>> will
>> > > > execute after each request is completed.
>> > > >
>> > > > In response to, "If so, are low-acks records subject to head-of-line
>> > > > blocking from high-acks records?," I believe an additional
>> configuration
>> > > is
>> > > > necessary to control this behavior. We could allow records to be
>> either
>> > > > sync or async, though the callback would still execute after each
>> batch
>> > > > with varying acknowledgment levels completes. To measure behavior
>> across
>> > > > acknowledgment levels, we could also include acks in
>> > > *ProducerIntercepor*.
>> > > >
>> > > > Furthermore, before this KIP, a producer could only include one acks
>> > > level
>> > > > so sequence is premised. However, with this change, we can *ONLY*
>> > > guarantee
>> > > > the sequence within records of the same acknowledgment level
>> because we
>> > > may
>> > > > send up to three separate requests to brokers.
>> > > > Best,
>> > > > TaiJuWu
>> > > >
>> > > >
>> > > > TaiJu Wu <[email protected]> 於 2024年11月6日 週三 上午10:01寫道：
>> > > >
>> > > > > Hi  Fred and Greg,
>> > > > >
>> > > > > Apologies for the delayed response.
>> > > > > Yes, you’re correct.
>> > > > > I’ll outline the behavior I expect.
>> > > > >
>> > > > > Thanks for your feedback!
>> > > > >
>> > > > > Best,
>> > > > > TaiJuWu
>> > > > >
>> > > > >
>> > > > > Greg Harris <[email protected]> 於 2024年11月6日 週三
>> 上午9:48寫道：
>> > > > >
>> > > > >> Hi TaiJuWu,
>> > > > >>
>> > > > >> Thanks for the KIP!
>> > > > >>
>> > > > >> Can you explain in the KIP about the behavior when the number of
>> acks
>> > > is
>> > > > >> different for individual records? I think the current description
>> > > using
>> > > > >> the
>> > > > >> word "straightforward" does little to explain that, and may
>> actually
>> > > be
>> > > > >> hiding some complexity.
>> > > > >>
>> > > > >> For example, the send() javadoc contains this: "Callbacks for
>> records
>> > > > >> being
>> > > > >> sent to the same partition are guaranteed to execute in order."
>> Is
>> > > this
>> > > > >> still true when acks vary for records within the same partition?
>> > > > >> If so, are low-acks records subject to head-of-line-blocking from
>> > > > >> high-acks
>> > > > >> records? It seems to me that this feature is useful when acks is
>> > > > specified
>> > > > >> per-topic, but introduces a lot of edge cases that are
>> underspecified.
>> > > > >>
>> > > > >> Thanks,
>> > > > >> Greg
>> > > > >>
>> > > > >>
>> > > > >> On Tue, Nov 5, 2024 at 4:52 PM TaiJu Wu <[email protected]>
>> wrote:
>> > > > >>
>> > > > >> > Hi Chia-Ping,
>> > > > >> >
>> > > > >> > Thanks for your feedback.
>> > > > >> > I have updated KIP based on your suggestions.
>> > > > >> >
>> > > > >> > Best,
>> > > > >> > Stanley
>> > > > >> >
>> > > > >> > Chia-Ping Tsai <[email protected]> 於 2024年11月5日 週二 下午4:41寫道：
>> > > > >> >
>> > > > >> > > hi TaiJuWu,
>> > > > >> > >
>> > > > >> > > Q0: Could you please add getter (Short acks()) to "public
>> > > interface"
>> > > > >> > > section?
>> > > > >> > >
>> > > > >> > > Q1: Could you please add RPC json reference to prove "been
>> > > available
>> > > > >> at
>> > > > >> > > the RPC-level,"
>> > > > >> > >
>> > > > >> > > Q2: Could you please add link to producer docs to prove
>> "share a
>> > > > >> single
>> > > > >> > > producer instance across multiple threads"
>> > > > >> > >
>> > > > >> > > Thanks,
>> > > > >> > > Chia-Ping
>> > > > >> > >
>> > > > >> > > On 2024/11/05 01:28:36 吳岱儒 wrote:
>> > > > >> > > > Hi all,
>> > > > >> > > >
>> > > > >> > > > I open a KIP-1107: Adding record-level acks for producers
>> > > > >> > > > <
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > >
>> > >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1107%3A++Adding+record-level+acks+for+producers
>> > > > >> > > >
>> > > > >> > > > to
>> > > > >> > > > reduce the limitation associated with reusing
>> KafkaProducer.
>> > > > >> > > >
>> > > > >> > > >
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > >
>> > >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1107%3A++Adding+record-level+acks+for+producers
>> > > > >> > > >
>> > > > >> > > > Feedbacks and suggestions are welcome.
>> > > > >> > > >
>> > > > >> > > > Thanks,
>> > > > >> > > > TaiJuWu
>> > > > >> > > >
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > > >
>> > > >
>> > >
>> >
>>
>

Re: [DISCUSS]KIP-1107: Adding record-level acks for producers

Reply via email to