Re: [DISCUSS]KIP-1107: Adding record-level acks for producers

TaiJu Wu Wed, 13 Nov 2024 22:21:30 -0800

Hi all,

Thanks for your feeback and @Chia-Ping's help.
.
I also agree topic-level acks config is more reasonable and it can simply
the story.
When I try implementing record-level acks, I notice I don't have good idea
to avoid iterating batches for get partition information (need by
*RecordAccumulator#partitionChanged*).


Back to the init question how can I handle different acks for batches:
First, we can attach *topic-level acks *to *RecordAccumulator#TopicInfo*.
Second,  we can return *Map<Acks, List<ProducerBatch>>* when
*RecordAccumulator#drainBatchesForOneNode
*is called. In this step, we can propagate acks to *sender*.
Finally, we can get the acks info and group same acks into a
*List<ProducerBatch>>* for a node in *sender#sendProduceRequests*.

If I missed something or there is any mistake, please let me know.
I will update this KIP later, thank your feedback.

Best,
TaiJuWu


Chia-Ping Tsai <[email protected]> 於 2024年11月14日 週四 上午9:46寫道：

> hi All
>
> This KIP is based on our use case where an edge application with many
> sensors wants to use a single producer to deliver ‘few but varied’ records
> with different acks settings. The reason for using a single producer is to
> minimize resource usage on edge devices with limited hardware capabilities.
> Currently, we use a producer pool to handle different acks values, which
> requires 3x producer instances. Additionally, this approach creates many
> idle producers if a sensor with a specific acks setting has no data for a
> while.
>
> I love David’s suggestion since the acks configuration is closely related
> to the topic. Maybe we can introduce an optional configuration in the
> producer to define topic-level acks, with the existing acks being the
> default for all topics. This approach is not only simple but also easy to
> understand and implement.
>
> Best,
> Chia-Ping
>
> On 2024/11/13 16:04:24 Andrew Schofield wrote:
> > Hi TaiJuWu,
> > I've been thinking for a while about this KIP before jumping into the
> discussion.
> >
> > I'm afraid that I don't think the approach in the KIP is the best, given
> the design
> > of the Kafka protocol in this area. Essentially, each Produce request
> contains
> > the acks value at the top level, and may contain records for many topics
> or
> > partitions. My point is that batching occurs at the level of a Produce
> request,
> > so changing the acks value between records will require a new Produce
> request
> > to be sent. There would likely be an efficiency penalty if this feature
> was used
> > heavily with the acks changing record by record.
> >
> > I can see that potentially an application might want different ack
> levels for
> > different topics, but I would be surprised if they use different ack
> levels within
> > the same topic. Maybe David's suggestion of defining the acks per topic
> > would be enough. What do you think?
> >
> > Thanks,
> > Andrew
> > ________________________________________
> > From: David Jacot <[email protected]>
> > Sent: 13 November 2024 15:31
> > To: [email protected] <[email protected]>
> > Subject: Re: [DISCUSS]KIP-1107: Adding record-level acks for producers
> >
> > Hi TaiJuWu,
> >
> > Thanks for the KIP.
> >
> > The motivation is not clear to me. Could you please elaborate a bit more
> on
> > it?
> >
> > My concern is that it adds a lot of complexity and the added value seems
> to
> > be low. Moreover, it will make reasoning about an application from the
> > server side more difficult because we can no longer assume that it writes
> > with the ack based on the config. Another issue is about the batching,
> how
> > do you plan to handle batches mixing records with different acks?
> >
> > An alternative approach may be to define the ack per topic. We could even
> > think about defining it on the server side as a topic config. I haven't
> > really thought about it but it may be something to explore a bit more.
> >
> > Best,
> > David
> >
> > On Wed, Nov 13, 2024 at 3:56 PM Frédérik Rouleau
> > <[email protected]> wrote:
> >
> > > Hi TaiJuWu,
> > >
> > > I find this adding lot's of complexity and I am still not convinced by
> the
> > > added value. IMO creating a producer instance per ack level is not
> > > problematic and the behavior is clear for developers. What would be the
> > > added value of the proposed change ?
> > >
> > > Regards,
> > >
> > >
> > > On Wed, Nov 6, 2024 at 7:50 AM TaiJu Wu <[email protected]> wrote:
> > >
> > > > Hi Fred and Greg,
> > > >
> > > > Thanks for your feedback and it really not straightforward but
> > > interesting!
> > > > There are some behavior I expect.
> > > >
> > > > The current producer uses the *RecordAccumulator* to gather records,
> and
> > > > the sender thread sends them in batches. We can track each record’s
> > > > acknowledgment setting as it appends to the *RecordAccumulator*,
> allowing
> > > > the *sender *to group batches by acknowledgment levels and
> topicPartition
> > > > when processing.
> > > >
> > > > Regarding the statement, "Callbacks for records being sent to the
> same
> > > > partition are guaranteed to execute in order," this is ensured when
> > > > *max.inflight.request
> > > > *is set to 1. We can send records with different acknowledgment
> levels in
> > > > the order of acks-0, acks=1, acks=-1. Since we need to send batches
> with
> > > > different acknowledgment levels batches to the broker, the callback
> will
> > > > execute after each request is completed.
> > > >
> > > > In response to, "If so, are low-acks records subject to head-of-line
> > > > blocking from high-acks records?," I believe an additional
> configuration
> > > is
> > > > necessary to control this behavior. We could allow records to be
> either
> > > > sync or async, though the callback would still execute after each
> batch
> > > > with varying acknowledgment levels completes. To measure behavior
> across
> > > > acknowledgment levels, we could also include acks in
> > > *ProducerIntercepor*.
> > > >
> > > > Furthermore, before this KIP, a producer could only include one acks
> > > level
> > > > so sequence is premised. However, with this change, we can *ONLY*
> > > guarantee
> > > > the sequence within records of the same acknowledgment level because
> we
> > > may
> > > > send up to three separate requests to brokers.
> > > > Best,
> > > > TaiJuWu
> > > >
> > > >
> > > > TaiJu Wu <[email protected]> 於 2024年11月6日 週三 上午10:01寫道：
> > > >
> > > > > Hi  Fred and Greg,
> > > > >
> > > > > Apologies for the delayed response.
> > > > > Yes, you’re correct.
> > > > > I’ll outline the behavior I expect.
> > > > >
> > > > > Thanks for your feedback!
> > > > >
> > > > > Best,
> > > > > TaiJuWu
> > > > >
> > > > >
> > > > > Greg Harris <[email protected]> 於 2024年11月6日 週三
> 上午9:48寫道：
> > > > >
> > > > >> Hi TaiJuWu,
> > > > >>
> > > > >> Thanks for the KIP!
> > > > >>
> > > > >> Can you explain in the KIP about the behavior when the number of
> acks
> > > is
> > > > >> different for individual records? I think the current description
> > > using
> > > > >> the
> > > > >> word "straightforward" does little to explain that, and may
> actually
> > > be
> > > > >> hiding some complexity.
> > > > >>
> > > > >> For example, the send() javadoc contains this: "Callbacks for
> records
> > > > >> being
> > > > >> sent to the same partition are guaranteed to execute in order." Is
> > > this
> > > > >> still true when acks vary for records within the same partition?
> > > > >> If so, are low-acks records subject to head-of-line-blocking from
> > > > >> high-acks
> > > > >> records? It seems to me that this feature is useful when acks is
> > > > specified
> > > > >> per-topic, but introduces a lot of edge cases that are
> underspecified.
> > > > >>
> > > > >> Thanks,
> > > > >> Greg
> > > > >>
> > > > >>
> > > > >> On Tue, Nov 5, 2024 at 4:52 PM TaiJu Wu <[email protected]>
> wrote:
> > > > >>
> > > > >> > Hi Chia-Ping,
> > > > >> >
> > > > >> > Thanks for your feedback.
> > > > >> > I have updated KIP based on your suggestions.
> > > > >> >
> > > > >> > Best,
> > > > >> > Stanley
> > > > >> >
> > > > >> > Chia-Ping Tsai <[email protected]> 於 2024年11月5日 週二 下午4:41寫道：
> > > > >> >
> > > > >> > > hi TaiJuWu,
> > > > >> > >
> > > > >> > > Q0: Could you please add getter (Short acks()) to "public
> > > interface"
> > > > >> > > section?
> > > > >> > >
> > > > >> > > Q1: Could you please add RPC json reference to prove "been
> > > available
> > > > >> at
> > > > >> > > the RPC-level,"
> > > > >> > >
> > > > >> > > Q2: Could you please add link to producer docs to prove
> "share a
> > > > >> single
> > > > >> > > producer instance across multiple threads"
> > > > >> > >
> > > > >> > > Thanks,
> > > > >> > > Chia-Ping
> > > > >> > >
> > > > >> > > On 2024/11/05 01:28:36 吳岱儒 wrote:
> > > > >> > > > Hi all,
> > > > >> > > >
> > > > >> > > > I open a KIP-1107: Adding record-level acks for producers
> > > > >> > > > <
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1107%3A++Adding+record-level+acks+for+producers
> > > > >> > > >
> > > > >> > > > to
> > > > >> > > > reduce the limitation associated with reusing KafkaProducer.
> > > > >> > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1107%3A++Adding+record-level+acks+for+producers
> > > > >> > > >
> > > > >> > > > Feedbacks and suggestions are welcome.
> > > > >> > > >
> > > > >> > > > Thanks,
> > > > >> > > > TaiJuWu
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS]KIP-1107: Adding record-level acks for producers

Reply via email to