Hi Divij and Kirk, Thanks for your response. You are right, this change is not straightforward and I apologize for that.
> we haven't answered the question about protocol for ProduceRequest raised above. Sorry but which question did I miss, this KIP has been modified from record-level to topic-level. > Note that there are disadvantages of "vertically scaling" a producer i.e. > reusing a producer with multiple threads. This change is optional so users can choose to adopt it. If they don't want to use this, it would not have any impact. > making producer(s) cheap to create is a goal worth pursuing. > I'd rather attack that in a more direct manner Thanks for your suggestion, I will investigate this approach simultaneously. Best, TaiJuWu Kirk True <k...@kirktrue.pro> 於 2025年1月7日 週二 上午8:34寫道: > Hi TaiJu! > > I will echo the concerns about the likelihood of gotchas arising in an > effort to work around the existing API and protocol design. > > If the central concern is the performance impact and/or resource overhead > of multiple client instances, I'd rather attack that in a more direct > manner. > > Thanks, > Kirk > > On Fri, Jan 3, 2025, at 8:03 AM, Divij Vaidya wrote: > > Hey TaiJu > > > > I read the latest version of the KIP. > > > > I understand the problem you are trying to solve here. But the solution > > needs more changes than you proposed and hence, is not straightforward. > As > > an example, we haven't answered the question about protocol for > > ProduceRequest raised above. A `ProduceRequest` defines `ack` at a > request > > level where the payload consists of records belonging to multiple topics. > > One way to solve it is to define topic-level `ack` at the server as > > suggested above in this thread, but wouldn't that require us to > > remove/deprecate this field? > > > > Alternatively, have you tried to explore the option of decreasing the > > resource footprint of an idle producer so that it is not expensive to > > create 3x producers? > > Note that there are disadvantages of "vertically scaling" a producer i.e. > > reusing a producer with multiple threads. One of the many disadvantages > is > > that all requests from the producer will be handled by the same network > > thread on the broker. If that network thread is busy doing IO for some > > reason (perhaps reading from disk is slow), then it will impact all other > > requests from that producer. Hence, making producer(s) cheap to create > is a > > goal worth pursuing. > > > > -- > > Divij Vaidya > > > > > > > > On Fri, Jan 3, 2025 at 4:39 AM TaiJu Wu <tjwu1...@gmail.com> wrote: > > > > > Hello folk, > > > > > > This thread is pending for a long time, I want to bump this thread and > get > > > more feedback. > > > Any questions are welcome. > > > > > > Best, > > > TaiJuWu > > > > > > TaiJu Wu <tjwu1...@gmail.com> 於 2024年11月23日 週六 下午9:15寫道: > > > > > > > Hi Chia-Ping, > > > > > > > > Sorry for late reply and thanks for your feedback to make this KIP > more > > > > valuable. > > > > After initial verification, I think this can do without large > changes. > > > > > > > > I have updated KIP, thanks a lot. > > > > > > > > Best, > > > > TaiJuWu > > > > > > > > > > > > Chia-Ping Tsai <chia7...@apache.org> 於 2024年11月20日 週三 下午5:06寫道: > > > > > > > >> hi TaiJuWu > > > >> > > > >> Is there a possibility to extend this KIP to include topic-level > > > >> compression for the producer? This is another issue that prevents us > > > from > > > >> sharing producers across different threads, as it's common to use > > > different > > > >> compression types for different topics (data). > > > >> > > > >> Best, > > > >> Chia-Ping > > > >> > > > >> On 2024/11/18 08:36:25 TaiJu Wu wrote: > > > >> > Hi Chia-Ping, > > > >> > > > > >> > Thanks for your suggestions and feedback. > > > >> > > > > >> > Q1: I have updated this according your suggestions. > > > >> > Q2: This is necessary change since there is a assumption about > > > >> > *RecourdAccumulator > > > >> > *that all records have same acks(e.g. ProducerConfig.acks) so we > need > > > >> to a > > > >> > method to distinguish which acks belong to each Batch. > > > >> > > > > >> > Best, > > > >> > TaiJuWu > > > >> > > > > >> > Chia-Ping Tsai <chia7...@apache.org> 於 2024年11月18日 週一 上午2:17寫道: > > > >> > > > > >> > > hi TaiJuWu > > > >> > > > > > >> > > Q0: > > > >> > > > > > >> > > `Format: topic.acks` the dot is acceptable character in topic > > > >> naming, so > > > >> > > maybe we should reverse the format to "acks.${topic}" to get the > > > acks > > > >> of > > > >> > > topic easily > > > >> > > > > > >> > > Q1: `Return Map<Acks, List<ProducerBatch>> when > > > >> > > RecordAccumulator#drainBatchesForOneNode is called.` > > > >> > > > > > >> > > this is weird to me, as all we need to do is pass `Map<String, > Acks> > > > >> to > > > >> > > `Sender` and make sure `Sender#sendProduceRequest` add correct > acks > > > to > > > >> > > ProduceRequest, right? > > > >> > > > > > >> > > Best, > > > >> > > Chia-Ping > > > >> > > > > > >> > > > > > >> > > > > > >> > > On 2024/11/15 05:12:33 TaiJu Wu wrote: > > > >> > > > Hi all, > > > >> > > > > > > >> > > > I have updated the contents of this KIP > > > >> > > > Please take a look and let me know what you think. > > > >> > > > > > > >> > > > Thanks, > > > >> > > > TaiJuWu > > > >> > > > > > > >> > > > On Thu, Nov 14, 2024 at 2:21 PM TaiJu Wu <tjwu1...@gmail.com> > > > >> wrote: > > > >> > > > > > > >> > > > > Hi all, > > > >> > > > > > > > >> > > > > Thanks for your feeback and @Chia-Ping's help. > > > >> > > > > . > > > >> > > > > I also agree topic-level acks config is more reasonable and > it > > > can > > > >> > > simply > > > >> > > > > the story. > > > >> > > > > When I try implementing record-level acks, I notice I don't > have > > > >> good > > > >> > > idea > > > >> > > > > to avoid iterating batches for get partition information > (need > > > by > > > >> > > > > *RecordAccumulator#partitionChanged*). > > > >> > > > > > > > >> > > > > Back to the init question how can I handle different acks > for > > > >> batches: > > > >> > > > > First, we can attach *topic-level acks *to > > > >> > > *RecordAccumulator#TopicInfo*. > > > >> > > > > Second, we can return *Map<Acks, List<ProducerBatch>>* when > > > >> > > *RecordAccumulator#drainBatchesForOneNode > > > >> > > > > *is called. In this step, we can propagate acks to *sender*. > > > >> > > > > Finally, we can get the acks info and group same acks into a > > > >> > > > > *List<ProducerBatch>>* for a node in > > > *sender#sendProduceRequests*. > > > >> > > > > > > > >> > > > > If I missed something or there is any mistake, please let me > > > know. > > > >> > > > > I will update this KIP later, thank your feedback. > > > >> > > > > > > > >> > > > > Best, > > > >> > > > > TaiJuWu > > > >> > > > > > > > >> > > > > > > > >> > > > > Chia-Ping Tsai <chia7...@apache.org> 於 2024年11月14日 週四 > 上午9:46寫道: > > > >> > > > > > > > >> > > > >> hi All > > > >> > > > >> > > > >> > > > >> This KIP is based on our use case where an edge application > > > with > > > >> many > > > >> > > > >> sensors wants to use a single producer to deliver ‘few but > > > >> varied’ > > > >> > > records > > > >> > > > >> with different acks settings. The reason for using a single > > > >> producer > > > >> > > is to > > > >> > > > >> minimize resource usage on edge devices with limited > hardware > > > >> > > capabilities. > > > >> > > > >> Currently, we use a producer pool to handle different acks > > > >> values, > > > >> > > which > > > >> > > > >> requires 3x producer instances. Additionally, this approach > > > >> creates > > > >> > > many > > > >> > > > >> idle producers if a sensor with a specific acks setting > has no > > > >> data > > > >> > > for a > > > >> > > > >> while. > > > >> > > > >> > > > >> > > > >> I love David’s suggestion since the acks configuration is > > > closely > > > >> > > related > > > >> > > > >> to the topic. Maybe we can introduce an optional > configuration > > > >> in the > > > >> > > > >> producer to define topic-level acks, with the existing acks > > > >> being the > > > >> > > > >> default for all topics. This approach is not only simple > but > > > also > > > >> > > easy to > > > >> > > > >> understand and implement. > > > >> > > > >> > > > >> > > > >> Best, > > > >> > > > >> Chia-Ping > > > >> > > > >> > > > >> > > > >> On 2024/11/13 16:04:24 Andrew Schofield wrote: > > > >> > > > >> > Hi TaiJuWu, > > > >> > > > >> > I've been thinking for a while about this KIP before > jumping > > > >> into > > > >> > > the > > > >> > > > >> discussion. > > > >> > > > >> > > > > >> > > > >> > I'm afraid that I don't think the approach in the KIP is > the > > > >> best, > > > >> > > > >> given the design > > > >> > > > >> > of the Kafka protocol in this area. Essentially, each > Produce > > > >> > > request > > > >> > > > >> contains > > > >> > > > >> > the acks value at the top level, and may contain records > for > > > >> many > > > >> > > > >> topics or > > > >> > > > >> > partitions. My point is that batching occurs at the > level of > > > a > > > >> > > Produce > > > >> > > > >> request, > > > >> > > > >> > so changing the acks value between records will require > a new > > > >> > > Produce > > > >> > > > >> request > > > >> > > > >> > to be sent. There would likely be an efficiency penalty > if > > > this > > > >> > > feature > > > >> > > > >> was used > > > >> > > > >> > heavily with the acks changing record by record. > > > >> > > > >> > > > > >> > > > >> > I can see that potentially an application might want > > > different > > > >> ack > > > >> > > > >> levels for > > > >> > > > >> > different topics, but I would be surprised if they use > > > >> different ack > > > >> > > > >> levels within > > > >> > > > >> > the same topic. Maybe David's suggestion of defining the > acks > > > >> per > > > >> > > topic > > > >> > > > >> > would be enough. What do you think? > > > >> > > > >> > > > > >> > > > >> > Thanks, > > > >> > > > >> > Andrew > > > >> > > > >> > ________________________________________ > > > >> > > > >> > From: David Jacot <dja...@confluent.io.INVALID> > > > >> > > > >> > Sent: 13 November 2024 15:31 > > > >> > > > >> > To: dev@kafka.apache.org <dev@kafka.apache.org> > > > >> > > > >> > Subject: Re: [DISCUSS]KIP-1107: Adding record-level acks > for > > > >> > > producers > > > >> > > > >> > > > > >> > > > >> > Hi TaiJuWu, > > > >> > > > >> > > > > >> > > > >> > Thanks for the KIP. > > > >> > > > >> > > > > >> > > > >> > The motivation is not clear to me. Could you please > elaborate > > > >> a bit > > > >> > > > >> more on > > > >> > > > >> > it? > > > >> > > > >> > > > > >> > > > >> > My concern is that it adds a lot of complexity and the > added > > > >> value > > > >> > > > >> seems to > > > >> > > > >> > be low. Moreover, it will make reasoning about an > application > > > >> from > > > >> > > the > > > >> > > > >> > server side more difficult because we can no longer > assume > > > >> that it > > > >> > > > >> writes > > > >> > > > >> > with the ack based on the config. Another issue is about > the > > > >> > > batching, > > > >> > > > >> how > > > >> > > > >> > do you plan to handle batches mixing records with > different > > > >> acks? > > > >> > > > >> > > > > >> > > > >> > An alternative approach may be to define the ack per > topic. > > > We > > > >> could > > > >> > > > >> even > > > >> > > > >> > think about defining it on the server side as a topic > > > config. I > > > >> > > haven't > > > >> > > > >> > really thought about it but it may be something to > explore a > > > >> bit > > > >> > > more. > > > >> > > > >> > > > > >> > > > >> > Best, > > > >> > > > >> > David > > > >> > > > >> > > > > >> > > > >> > On Wed, Nov 13, 2024 at 3:56 PM Frédérik Rouleau > > > >> > > > >> > <froul...@confluent.io.invalid> wrote: > > > >> > > > >> > > > > >> > > > >> > > Hi TaiJuWu, > > > >> > > > >> > > > > > >> > > > >> > > I find this adding lot's of complexity and I am still > not > > > >> > > convinced > > > >> > > > >> by the > > > >> > > > >> > > added value. IMO creating a producer instance per ack > level > > > >> is not > > > >> > > > >> > > problematic and the behavior is clear for developers. > What > > > >> would > > > >> > > be > > > >> > > > >> the > > > >> > > > >> > > added value of the proposed change ? > > > >> > > > >> > > > > > >> > > > >> > > Regards, > > > >> > > > >> > > > > > >> > > > >> > > > > > >> > > > >> > > On Wed, Nov 6, 2024 at 7:50 AM TaiJu Wu < > > > tjwu1...@gmail.com> > > > >> > > wrote: > > > >> > > > >> > > > > > >> > > > >> > > > Hi Fred and Greg, > > > >> > > > >> > > > > > > >> > > > >> > > > Thanks for your feedback and it really not > > > straightforward > > > >> but > > > >> > > > >> > > interesting! > > > >> > > > >> > > > There are some behavior I expect. > > > >> > > > >> > > > > > > >> > > > >> > > > The current producer uses the *RecordAccumulator* to > > > gather > > > >> > > > >> records, and > > > >> > > > >> > > > the sender thread sends them in batches. We can track > > > each > > > >> > > record’s > > > >> > > > >> > > > acknowledgment setting as it appends to the > > > >> *RecordAccumulator*, > > > >> > > > >> allowing > > > >> > > > >> > > > the *sender *to group batches by acknowledgment > levels > > > and > > > >> > > > >> topicPartition > > > >> > > > >> > > > when processing. > > > >> > > > >> > > > > > > >> > > > >> > > > Regarding the statement, "Callbacks for records being > > > sent > > > >> to > > > >> > > the > > > >> > > > >> same > > > >> > > > >> > > > partition are guaranteed to execute in order," this > is > > > >> ensured > > > >> > > when > > > >> > > > >> > > > *max.inflight.request > > > >> > > > >> > > > *is set to 1. We can send records with different > > > >> acknowledgment > > > >> > > > >> levels in > > > >> > > > >> > > > the order of acks-0, acks=1, acks=-1. Since we need > to > > > send > > > >> > > batches > > > >> > > > >> with > > > >> > > > >> > > > different acknowledgment levels batches to the > broker, > > > the > > > >> > > callback > > > >> > > > >> will > > > >> > > > >> > > > execute after each request is completed. > > > >> > > > >> > > > > > > >> > > > >> > > > In response to, "If so, are low-acks records subject > to > > > >> > > head-of-line > > > >> > > > >> > > > blocking from high-acks records?," I believe an > > > additional > > > >> > > > >> configuration > > > >> > > > >> > > is > > > >> > > > >> > > > necessary to control this behavior. We could allow > > > records > > > >> to be > > > >> > > > >> either > > > >> > > > >> > > > sync or async, though the callback would still > execute > > > >> after > > > >> > > each > > > >> > > > >> batch > > > >> > > > >> > > > with varying acknowledgment levels completes. To > measure > > > >> > > behavior > > > >> > > > >> across > > > >> > > > >> > > > acknowledgment levels, we could also include acks in > > > >> > > > >> > > *ProducerIntercepor*. > > > >> > > > >> > > > > > > >> > > > >> > > > Furthermore, before this KIP, a producer could only > > > >> include one > > > >> > > acks > > > >> > > > >> > > level > > > >> > > > >> > > > so sequence is premised. However, with this change, > we > > > can > > > >> > > *ONLY* > > > >> > > > >> > > guarantee > > > >> > > > >> > > > the sequence within records of the same > acknowledgment > > > >> level > > > >> > > > >> because we > > > >> > > > >> > > may > > > >> > > > >> > > > send up to three separate requests to brokers. > > > >> > > > >> > > > Best, > > > >> > > > >> > > > TaiJuWu > > > >> > > > >> > > > > > > >> > > > >> > > > > > > >> > > > >> > > > TaiJu Wu <tjwu1...@gmail.com> 於 2024年11月6日 週三 > 上午10:01寫道: > > > >> > > > >> > > > > > > >> > > > >> > > > > Hi Fred and Greg, > > > >> > > > >> > > > > > > > >> > > > >> > > > > Apologies for the delayed response. > > > >> > > > >> > > > > Yes, you’re correct. > > > >> > > > >> > > > > I’ll outline the behavior I expect. > > > >> > > > >> > > > > > > > >> > > > >> > > > > Thanks for your feedback! > > > >> > > > >> > > > > > > > >> > > > >> > > > > Best, > > > >> > > > >> > > > > TaiJuWu > > > >> > > > >> > > > > > > > >> > > > >> > > > > > > > >> > > > >> > > > > Greg Harris <greg.har...@aiven.io.invalid> 於 > > > 2024年11月6日 > > > >> 週三 > > > >> > > > >> 上午9:48寫道: > > > >> > > > >> > > > > > > > >> > > > >> > > > >> Hi TaiJuWu, > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> Thanks for the KIP! > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> Can you explain in the KIP about the behavior > when the > > > >> > > number of > > > >> > > > >> acks > > > >> > > > >> > > is > > > >> > > > >> > > > >> different for individual records? I think the > current > > > >> > > description > > > >> > > > >> > > using > > > >> > > > >> > > > >> the > > > >> > > > >> > > > >> word "straightforward" does little to explain > that, > > > and > > > >> may > > > >> > > > >> actually > > > >> > > > >> > > be > > > >> > > > >> > > > >> hiding some complexity. > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> For example, the send() javadoc contains this: > > > >> "Callbacks for > > > >> > > > >> records > > > >> > > > >> > > > >> being > > > >> > > > >> > > > >> sent to the same partition are guaranteed to > execute > > > in > > > >> > > order." > > > >> > > > >> Is > > > >> > > > >> > > this > > > >> > > > >> > > > >> still true when acks vary for records within the > same > > > >> > > partition? > > > >> > > > >> > > > >> If so, are low-acks records subject to > > > >> head-of-line-blocking > > > >> > > from > > > >> > > > >> > > > >> high-acks > > > >> > > > >> > > > >> records? It seems to me that this feature is > useful > > > when > > > >> > > acks is > > > >> > > > >> > > > specified > > > >> > > > >> > > > >> per-topic, but introduces a lot of edge cases > that are > > > >> > > > >> underspecified. > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> Thanks, > > > >> > > > >> > > > >> Greg > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> On Tue, Nov 5, 2024 at 4:52 PM TaiJu Wu < > > > >> tjwu1...@gmail.com> > > > >> > > > >> wrote: > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > Hi Chia-Ping, > > > >> > > > >> > > > >> > > > > >> > > > >> > > > >> > Thanks for your feedback. > > > >> > > > >> > > > >> > I have updated KIP based on your suggestions. > > > >> > > > >> > > > >> > > > > >> > > > >> > > > >> > Best, > > > >> > > > >> > > > >> > Stanley > > > >> > > > >> > > > >> > > > > >> > > > >> > > > >> > Chia-Ping Tsai <chia7...@apache.org> 於 > 2024年11月5日 > > > 週二 > > > >> > > 下午4:41寫道: > > > >> > > > >> > > > >> > > > > >> > > > >> > > > >> > > hi TaiJuWu, > > > >> > > > >> > > > >> > > > > > >> > > > >> > > > >> > > Q0: Could you please add getter (Short > acks()) to > > > >> "public > > > >> > > > >> > > interface" > > > >> > > > >> > > > >> > > section? > > > >> > > > >> > > > >> > > > > > >> > > > >> > > > >> > > Q1: Could you please add RPC json reference to > > > prove > > > >> > > "been > > > >> > > > >> > > available > > > >> > > > >> > > > >> at > > > >> > > > >> > > > >> > > the RPC-level," > > > >> > > > >> > > > >> > > > > > >> > > > >> > > > >> > > Q2: Could you please add link to producer > docs to > > > >> prove > > > >> > > > >> "share a > > > >> > > > >> > > > >> single > > > >> > > > >> > > > >> > > producer instance across multiple threads" > > > >> > > > >> > > > >> > > > > > >> > > > >> > > > >> > > Thanks, > > > >> > > > >> > > > >> > > Chia-Ping > > > >> > > > >> > > > >> > > > > > >> > > > >> > > > >> > > On 2024/11/05 01:28:36 吳岱儒 wrote: > > > >> > > > >> > > > >> > > > Hi all, > > > >> > > > >> > > > >> > > > > > > >> > > > >> > > > >> > > > I open a KIP-1107: Adding record-level acks > for > > > >> > > producers > > > >> > > > >> > > > >> > > > < > > > >> > > > >> > > > >> > > > > > >> > > > >> > > > >> > > > > >> > > > >> > > > >> > > > >> > > > >> > > > > > > >> > > > >> > > > > > >> > > > >> > > > >> > > > > > >> > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1107%3A++Adding+record-level+acks+for+producers > > > >> > > > >> > > > >> > > > > > > >> > > > >> > > > >> > > > to > > > >> > > > >> > > > >> > > > reduce the limitation associated with > reusing > > > >> > > > >> KafkaProducer. > > > >> > > > >> > > > >> > > > > > > >> > > > >> > > > >> > > > > > > >> > > > >> > > > >> > > > > > >> > > > >> > > > >> > > > > >> > > > >> > > > >> > > > >> > > > >> > > > > > > >> > > > >> > > > > > >> > > > >> > > > >> > > > > > >> > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1107%3A++Adding+record-level+acks+for+producers > > > >> > > > >> > > > >> > > > > > > >> > > > >> > > > >> > > > Feedbacks and suggestions are welcome. > > > >> > > > >> > > > >> > > > > > > >> > > > >> > > > >> > > > Thanks, > > > >> > > > >> > > > >> > > > TaiJuWu > > > >> > > > >> > > > >> > > > > > > >> > > > >> > > > >> > > > > > >> > > > >> > > > >> > > > > >> > > > >> > > > >> > > > >> > > > >> > > > > > > > >> > > > >> > > > > > > >> > > > >> > > > > > >> > > > >> > > > > >> > > > >> > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > > > > > > > >