Hi all, The parent KIP-1150 was voted for and accepted. Let's now focus on the technical details presented in this KIP-1164 and also in KIP-1163: Diskless Core [1].
Best, Ivan [1] https://cwiki.apache.org/confluence/display/KAFKA/KIP-1163%3A+Diskless+Core On Tue, Apr 29, 2025, at 07:54, yuxia wrote: > Thanks Giuseppe for the explanation! It make sense to me. > > Best regards, > Yuxia > > ----- 原始邮件 ----- > 发件人: "Giuseppe Lillo" <[email protected]> > 收件人: "dev" <[email protected]> > 发送时间: 星期二, 2025年 4 月 29日 上午 12:14:14 > 主题: Re: [SPAM][DISCUSS] KIP-1164: Topic Based Batch Coordinator > > Hello Yuxia, thanks for your question and interest! > > When producing, the broker will call the relevant Batch Coordinator with a > CommitBatches request. > The Batch Coordinator will then write the metadata about these batches into > the __diskless-metadata topic and update its internal state persisted on > SQLite. It will then reply with the assigned offsets. > Read-only Batch Coordinators will also replicate those metadata into their > own internal state. > > When consuming, the broker will call the relevant Batch Coordinator with a > FindBatches request. > The Batch Coordinator will search the requested offsets within its internal > state and reply with the batch coordinates (object key, offset within the > object). > > In your example, I suppose that A, B and C are all messages written to the > same topic-partition. > The problem you described is solved by the idempotent producer. In order to > support idempotent producer in Diskless topics, information about producer > ID and sequence numbers must be communicated to the Batch Coordinator when > committing a new batch. We included information about the producer > (producer id and producer epoch) and the sequence numbers (base sequence, > last sequence) both in the commitFile public interface and in the > CommitBatches API. When serving a CommitBatches request that includes > idempotent producer information, the Batch Coordinator will also perform > some checks to understand if the produce request is a duplicate or if it > contains out-of-order messages by checking with the internal state. > > Best regards, > Giuseppe > > On Thu, Apr 24, 2025 at 4:24 AM yuxia <[email protected]> wrote: > > > Hi! > > > > Thanks for the greate work and I'm excited to see it happens. This KIP > > looks well to me. > > Seems Batch Coordinator is very important in the diskless implementation, > > could you explain more details on the implementation? I think it'll be much > > better to show what Batch Coordinator will do when write/read or other > > request comes. > > > > I'm also wondering how it "chooses the total ordering for writes" and > > what's the "information necessary to support idempotent producers". > > I'm thinking about the following cases: > > 1: client is going to send message A, B, C to Kafka > > 2: client sending A, B to broker1, broker1 recieve A, B > > 3: broker1 down, client send C to broker2 > > 4: since broker1 is down, then client recieve A,B fail and retry to send > > A,B to broker2 > > Then, how Batch Coordinator can choose total order to be A,B,C ? > > > > > > Best regards, > > Yuxia > > > > ----- 原始邮件 ----- > > 发件人: "Ivan Yurchenko" <[email protected]> > > 收件人: "dev" <[email protected]> > > 发送时间: 星期三, 2025年 4 月 23日 下午 5:46:46 > > 主题: [SPAM][DISCUSS] KIP-1164: Topic Based Batch Coordinator > > > > Hi all! > > > > We want to start the discussion thread for KIP-1164: Topic Based Batch > > Coordinator [1], which is a sub-KIP for KIP-1150 [2]. > > > > Let's use the main KIP-1150 discuss thread [3] for high-level questions, > > motivation, and general direction of the feature and this thread for > > discussing the batch coordinator interface and the proposed topic-based > > implementation. > > > > Best, > > Ivan > > > > [1] > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1164%3A+Topic+Based+Batch+Coordinator > > [2] > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1150%3A+Diskless+Topics > > [3] https://lists.apache.org/thread/ljxc495nf39myp28pmf77sm2xydwjm6d > > >
