I'm very sorry. It seems the mailing list is stripping the attachments. I'll post the two links: https://drive.google.com/file/d/1El1Kl2x8JYt3CxdwD0cZ-n6flzZ5s0Gc/view https://drive.google.com/file/d/1SxfdZIDwimM9OTGYMHCrshFclCkRNqJm/view
Sorry for the noise in the list. I'll do better next time. - Ivan On Thu, Oct 2, 2025, at 14:11, Ivan Yurchenko wrote: > Apologies, it seems the images didn't attach... > There were only two, I'm attaching them to this message. > Sorry for the inconvenience! > > - Ivan > > On Thu, Oct 2, 2025, at 14:06, Ivan Yurchenko wrote: >> Hi dear Kafka community, >> >> In the initial Diskless proposal, we proposed to have a separate component, >> batch/diskless coordinator, whose role would be to centrally manage the >> batch and WAL file metadata for diskless topics. This component drew many >> reasonable comments from the community about how it would support various >> Kafka features (transactions, queues) and its scalability. While we believe >> we have good answers to all the expressed concerns, we took a step back and >> looked at the problem from a different perspective. >> >> We would like to propose an alternative Diskless design *without a >> centralized coordinator*. We believe this approach has potential and propose >> to discuss it as it may be more appealing to the community. >> >> Let us explain the idea. Most of the complications with the original >> Diskless approach come from one necessary architecture change: globalizing >> the local state of partition leader in the batch coordinator. This causes >> deviations to the established workflows in various features like produce >> idempotence and transactions, queues, retention, etc. These deviations need >> to be carefully considered, designed, and later implemented and tested. In >> the new approach we want to avoid this by making partition leaders again >> responsible for managing their partitions, even in diskless topics. >> >> In classic Kafka topics, batch data and metadata are blended together in the >> one partition log. The crux of the Diskless idea is to decouple them and >> move data to the remote storage, while keeping metadata somewhere else. >> Using the central batch coordinator for managing batch metadata is one way, >> but not the only. >> >> Let’s now think about managing metadata for each user partition >> independently. Generally partitions are independent and don’t share anything >> apart from that their data are mixed in WAL files. If we figure out how to >> commit and later delete WAL files safely, we will achieve the necessary >> autonomy that allows us to get rid of the central batch coordinator. >> Instead, *each diskless user partition will be managed by its leader*, as in >> classic Kafka topics. Also like in classic topics, the leader uses the >> partition log as the way to persist batch metadata, i.e. the regular batch >> header + the information about how to find this batch on remote storage. In >> contrast to classic topics, batch data is in remote storage. >> >> For clarity, let’s compare the three designs: >> • Classic topics: >> • Data and metadata are co-located in the partition log. >> • The partition log content: [Batch header (metadata)|Batch data]. >> • The partition log is replicated to the followers. >> • The replicas and leader have local state built from metadata. >> • Original Diskless: >> • Metadata is in the batch coordinator, data is on remote storage. >> • The partition state is global in the batch coordinator. >> • New Diskless: >> • Metadata is in the partition log, data is on remote storage. >> • Partition log content: [Batch header (metadata)|Batch coordinates on >> remote storage]. >> • The partition log is replicated to the followers. >> • The replicas and leader have local state built from metadata. >> >> Let’s consider the produce path. Here’s the reminder of the original >> Diskless design: >> >> >> The new approach could be depicted as the following: >> >> >> As you can see, the main difference is that now instead of a single commit >> request to the batch coordinator, we send multiple parallel commit requests >> to all the leaders of each partition involved in the WAL file. Each of them >> will commit its batches independently, without coordinating with other >> leaders and any other components. Batch data is addressed by the WAL file >> name, the byte offset and size, which allows partitions to know nothing >> about other partitions to access their data in shared WAL files. >> >> The number of partitions involved in a single WAL file may be quite large, >> e.g. a hundred. A hundred network requests to commit one WAL file is very >> impractical. However, there are ways to reduce this number: >> 1. Partition leaders are located on brokers. Requests to leaders on one >> broker could be grouped together into a single physical network request >> (resembling the normal Produce request that may carry batches for many >> partitions inside). This will cap the number of network requests to the >> number of brokers in the cluster. >> 2. If we craft the cluster metadata to make producers send their requests to >> the right brokers (with respect to AZs), we may achieve the higher >> concentration of logical commit requests in physical network requests >> reducing the number of the latter ones even further, ideally to one. >> >> Obviously, out of multiple commit requests some may fail or time out for a >> variety of reasons. This is fine. Some producers will receive totally or >> partially failed responses to their Produce requests, similar to what they >> would have received when appending to a classic topic fails or times out. If >> a partition experiences problems, other partitions will not be affected >> (again, like in classic topics). Of course, the uncommitted data will be >> garbage in WAL files. But WAL files are short-lived (batches are constantly >> assembled into segments and offloaded to tiered storage), so this garbage >> will be eventually deleted. >> >> For safely deleting WAL files we now need to centrally manage them, as this >> is the only state and logic that spans multiple partitions. On the diagram, >> you can see another commit request called “Commit file (best effort)” going >> to the WAL File Manager. This manager will be responsible for the following: >> 1. Collecting (by requests from brokers) and persisting information about >> committed WAL files. >> 2. To handle potential failures in file information delivery, it will be >> doing prefix scan on the remote storage periodically to find and register >> unknown files. The period of this scan will be configurable and ideally >> should be quite long. >> 3. Checking with the relevant partition leaders (after a grace period) if >> they still have batches in a particular file. >> 4. Physically deleting files when they aren’t anymore referred to by any >> partition. >> >> This new design offers the following advantages: >> 1. It simplifies the implementation of many Kafka features such as >> idempotence, transactions, queues, tiered storage, retention. Now we don’t >> need to abstract away and reuse the code from partition leaders in the batch >> coordinator. Instead, we will literally use the same code paths in leaders, >> with little adaptation. Workflows from classic topics mostly remain >> unchanged. >> For example, it seems that >> ReplicaManager.maybeSendPartitionsToTransactionCoordinator and >> KafkaApis.handleWriteTxnMarkersRequest used for transaction support on the >> partition leader side could be used for diskless topics with little >> adaptation. ProducerStateManager, needed for both idempotent produce and >> transactions, would be reused. >> Another example is share groups support, where the share partition leader, >> being co-located with the partition leader, would execute the same logic for >> both diskless and classic topics. >> 2. It returns to the familiar partition-based scaling model, where >> partitions are independent. >> 3. It makes the operation and failure patterns closer to the familiar ones >> from classic topics. >> 4. It opens a straightforward path to seamless switching the topics modes >> between diskless and classic. >> >> The rest of the things remain unchanged compared to the previous Diskless >> design (after all previous discussions). Such things as local segment >> materialization by replicas, the consume path, tiered storage integration, >> etc. >> >> If the community finds this design more suitable, we will update the KIP(s) >> accordingly and continue working on it. Please let us know what you think. >> >> Best regards, >> Ivan and Diskless team >> >> On Mon, Sep 29, 2025, at 15:06, Ivan Yurchenko wrote: >> > Hi Justine, >> > >> > Yes, you're right. We need to track the aborted transactions for in the >> > diskless coordinator for as long as the corresponding offsets are there. >> > With the tiered storage unification Greg mentioned earlier, this will be >> > finite time even for infinite data retention. >> > >> > Best, >> > Ivan >> > >> > On Wed, Sep 17, 2025, at 19:41, Justine Olshan wrote: >> > > Hey Ivan, >> > > >> > > Thanks for the response. I think most of what you said made sense, but I >> > > did have some questions about this part: >> > > >> > > > As we understand this, the partition leader in classic topics forgets >> > > about a transaction once it’s replicated (HWM overpasses it). The >> > > transaction coordinator acts like the main guardian, allowing partition >> > > leaders to do this safely. Please correct me if this is wrong. We think >> > > about relying on this with the batch coordinator and delete the >> > > information >> > > about a transaction once it’s finished (as there’s no replication and HWM >> > > advances immediately). >> > > >> > > I didn't quite understand this. In classic topics, we have maps for >> > > ongoing >> > > transactions which remove state when the transaction is completed and an >> > > aborted transactions index which is retained for much longer. Once the >> > > transaction is completed, the coordinator is no longer involved in >> > > maintaining this partition side state, and it is subject to compaction >> > > etc. >> > > Looking back at the outline provided above, I didn't see much about the >> > > fetch path, so maybe that could be expanded a bit further. I saw the >> > > following in a response: >> > > > When the broker constructs a fully valid local segment, all the >> > > > necessary >> > > control batches will be inserted and indices, including the transaction >> > > index will be built to serve FetchRequests exactly as they are today. >> > > >> > > Based on this, it seems like we need to retain the information about >> > > aborted txns for longer. >> > > >> > > Thanks, >> > > Justine >> > > >> > > On Mon, Sep 15, 2025 at 9:43 AM Ivan Yurchenko <[email protected]> wrote: >> > > >> > > > Hi Justine and all, >> > > > >> > > > Thank you for your questions! >> > > > >> > > > > JO 1. >Since a transaction could be uniquely identified with >> > > > > producer ID >> > > > > and epoch, the positive result of this check could be cached locally >> > > > > Are we saying that only new transaction version 2 transactions can be >> > > > used >> > > > > here? If not, we can't uniquely identify transactions with producer >> > > > > id + >> > > > > epoch >> > > > >> > > > You’re right that we (probably unintentionally) focused only on >> > > > version 2. >> > > > We can either limit the support to version 2 or consider using some >> > > > surrogates to support version 1. >> > > > >> > > > > JO 2. >The batch coordinator does the final transactional checks of >> > > > > the >> > > > > batches. This procedure would output the same errors like the >> > > > > partition >> > > > > leader in classic topics would do. >> > > > > Can you expand on what these checks are? Would you be checking if the >> > > > > transaction was still ongoing for example?* * >> > > > >> > > > Yes, the producer epoch, that the transaction is ongoing, and of course >> > > > the normal idempotence checks. What the partition leader in the classic >> > > > topics does before appending a batch to the local log (e.g. in >> > > > UnifiedLog.maybeStartTransactionVerification and >> > > > UnifiedLog.analyzeAndValidateProducerState). In Diskless, we >> > > > unfortunately >> > > > cannot do these checks before appending the data to the WAL segment and >> > > > uploading it, but we can “tombstone” these batches in the batch >> > > > coordinator >> > > > during the final commit. >> > > > >> > > > > Is there state about ongoing >> > > > > transactions in the batch coordinator? I see some other state >> > > > > mentioned >> > > > in >> > > > > the End transaction section, but it's not super clear what state is >> > > > stored >> > > > > and when it is stored. >> > > > >> > > > Right, this should have been more explicit. As the partition leader >> > > > tracks >> > > > ongoing transactions for classic topics, the batch coordinator has to >> > > > as >> > > > well. So when a transaction starts and ends, the transaction >> > > > coordinator >> > > > must inform the batch coordinator about this. >> > > > >> > > > > JO 3. I didn't see anything about maintaining LSO -- perhaps that >> > > > > would >> > > > be >> > > > > stored in the batch coordinator? >> > > > >> > > > Yes. This could be deduced from the committed batches and other >> > > > information, but for the sake of performance we’d better store it >> > > > explicitly. >> > > > >> > > > > JO 4. Are there any thoughts about how long transactional state is >> > > > > maintained in the batch coordinator and how it will be cleaned up? >> > > > >> > > > As we understand this, the partition leader in classic topics forgets >> > > > about a transaction once it’s replicated (HWM overpasses it). The >> > > > transaction coordinator acts like the main guardian, allowing partition >> > > > leaders to do this safely. Please correct me if this is wrong. We think >> > > > about relying on this with the batch coordinator and delete the >> > > > information >> > > > about a transaction once it’s finished (as there’s no replication and >> > > > HWM >> > > > advances immediately). >> > > > >> > > > Best, >> > > > Ivan >> > > > >> > > > On Tue, Sep 9, 2025, at 00:38, Justine Olshan wrote: >> > > > > Hey folks, >> > > > > >> > > > > Excited to see some updates related to transactions! >> > > > > >> > > > > I had a few questions. >> > > > > >> > > > > JO 1. >Since a transaction could be uniquely identified with >> > > > > producer ID >> > > > > and epoch, the positive result of this check could be cached locally >> > > > > Are we saying that only new transaction version 2 transactions can be >> > > > used >> > > > > here? If not, we can't uniquely identify transactions with producer >> > > > > id + >> > > > > epoch >> > > > > >> > > > > JO 2. >The batch coordinator does the final transactional checks of >> > > > > the >> > > > > batches. This procedure would output the same errors like the >> > > > > partition >> > > > > leader in classic topics would do. >> > > > > Can you expand on what these checks are? Would you be checking if the >> > > > > transaction was still ongoing for example? Is there state about >> > > > > ongoing >> > > > > transactions in the batch coordinator? I see some other state >> > > > > mentioned >> > > > in >> > > > > the End transaction section, but it's not super clear what state is >> > > > stored >> > > > > and when it is stored. >> > > > > >> > > > > JO 3. I didn't see anything about maintaining LSO -- perhaps that >> > > > > would >> > > > be >> > > > > stored in the batch coordinator? >> > > > > >> > > > > JO 4. Are there any thoughts about how long transactional state is >> > > > > maintained in the batch coordinator and how it will be cleaned up? >> > > > > >> > > > > On Mon, Sep 8, 2025 at 10:38 AM Jun Rao <[email protected]> >> > > > wrote: >> > > > > >> > > > > > Hi, Greg and Ivan, >> > > > > > >> > > > > > Thanks for the update. A few comments. >> > > > > > >> > > > > > JR 10. "Consumer fetches are now served from local segments, making >> > > > use of >> > > > > > the >> > > > > > indexes, page cache, request purgatory, and zero-copy functionality >> > > > already >> > > > > > built into classic topics." >> > > > > > JR 10.1 Does the broker build the producer state for each >> > > > > > partition in >> > > > > > diskless topics? >> > > > > > JR 10.2 For transactional data, the consumer fetches need to know >> > > > aborted >> > > > > > records. How is that achieved? >> > > > > > >> > > > > > JR 11. "The batch coordinator saves that the transaction is >> > > > > > finished >> > > > and >> > > > > > also inserts the control batches in the corresponding logs of the >> > > > involved >> > > > > > Diskless topics. This happens only on the metadata level, no actual >> > > > control >> > > > > > batches are written to any file. " >> > > > > > A fetch response could include multiple transactional batches. How >> > > > does the >> > > > > > broker obtain the information about the ending control batch for >> > > > > > each >> > > > > > batch? Does that mean that a fetch response needs to be built by >> > > > > > stitching record batches and generated control batches together? >> > > > > > >> > > > > > JR 12. Queues: Is there still a share partition leader that all >> > > > consumers >> > > > > > are routed to? >> > > > > > >> > > > > > JR 13. "Should the KIPs be modified to include this or it's too >> > > > > > implementation-focused?" It would be useful to include enough >> > > > > > details >> > > > to >> > > > > > understand correctness and performance impact. >> > > > > > >> > > > > > HC5. Henry has a valid point. Requests from a given producer >> > > > > > contain a >> > > > > > sequence number, which is ordered. If a producer sends every >> > > > > > Produce >> > > > > > request to an arbitrary broker, those requests could reach the >> > > > > > batch >> > > > > > coordinator in different order and lead to rejection of the produce >> > > > > > requests. >> > > > > > >> > > > > > Jun >> > > > > > >> > > > > > On Thu, Sep 4, 2025 at 12:00 AM Ivan Yurchenko <[email protected]> >> > > > > > wrote: >> > > > > > >> > > > > > > Hi all, >> > > > > > > >> > > > > > > We have also thought in a bit more details about transactions and >> > > > queues, >> > > > > > > here's the plan. >> > > > > > > >> > > > > > > *Transactions* >> > > > > > > >> > > > > > > The support for transactions in *classic topics* is based on >> > > > > > > precise >> > > > > > > interactions between three actors: clients (mostly producers, but >> > > > also >> > > > > > > consumers), brokers (ReplicaManager and other classes), and >> > > > transaction >> > > > > > > coordinators. Brokers also run partition leaders with their local >> > > > state >> > > > > > > (ProducerStateManager and others). >> > > > > > > >> > > > > > > The high level (some details skipped) workflow is the following. >> > > > When a >> > > > > > > transactional Produce request is received by the broker: >> > > > > > > 1. For each partition, the partition leader checks if a non-empty >> > > > > > > transaction is running for this partition. This is done using its >> > > > local >> > > > > > > state derived from the log metadata (ProducerStateManager, >> > > > > > > VerificationStateEntry, VerificationGuard). >> > > > > > > 2. The transaction coordinator is informed about all the >> > > > > > > partitions >> > > > that >> > > > > > > aren’t part of the transaction to include them. >> > > > > > > 3. The partition leaders do additional transactional checks. >> > > > > > > 4. The partition leaders append the transactional data to their >> > > > > > > logs >> > > > and >> > > > > > > update some of their state (for example, log the fact that the >> > > > > > transaction >> > > > > > > is running for the partition and its first offset). >> > > > > > > >> > > > > > > When the transaction is committed or aborted: >> > > > > > > 1. The producer contacts the transaction coordinator directly >> > > > > > > with >> > > > > > > EndTxnRequest. >> > > > > > > 2. The transaction coordinator writes PREPARE_COMMIT or >> > > > PREPARE_ABORT to >> > > > > > > its log and responds to the producer. >> > > > > > > 3. The transaction coordinator sends WriteTxnMarkersRequest to >> > > > > > > the >> > > > > > leaders >> > > > > > > of the involved partitions. >> > > > > > > 4. The partition leaders write the transaction markers to their >> > > > > > > logs >> > > > and >> > > > > > > respond to the coordinator. >> > > > > > > 5. The coordinator writes the final transaction state >> > > > COMPLETE_COMMIT or >> > > > > > > COMPLETE_ABORT. >> > > > > > > >> > > > > > > In classic topics, partitions have leaders and lots of important >> > > > state >> > > > > > > necessary for supporting this workflow is local. The main >> > > > > > > challenge >> > > > in >> > > > > > > mapping this to Diskless comes from the fact there are no >> > > > > > > partition >> > > > > > > leaders, so the corresponding pieces of state need to be >> > > > > > > globalized >> > > > in >> > > > > > the >> > > > > > > batch coordinator. We are already doing this to support >> > > > > > > idempotent >> > > > > > produce. >> > > > > > > >> > > > > > > The high level workflow for *diskless topics* would look very >> > > > similar: >> > > > > > > 1. For each partition, the broker checks if a non-empty >> > > > > > > transaction >> > > > is >> > > > > > > running for this partition. In contrast to classic topics, this >> > > > > > > is >> > > > > > checked >> > > > > > > against the batch coordinator with a single RPC. Since a >> > > > > > > transaction >> > > > > > could >> > > > > > > be uniquely identified with producer ID and epoch, the positive >> > > > result of >> > > > > > > this check could be cached locally (for the double configured >> > > > duration >> > > > > > of a >> > > > > > > transaction, for example). >> > > > > > > 2. The same: The transaction coordinator is informed about all >> > > > > > > the >> > > > > > > partitions that aren’t part of the transaction to include them. >> > > > > > > 3. No transactional checks are done on the broker side. >> > > > > > > 4. The broker appends the transactional data to the current >> > > > > > > shared >> > > > WAL >> > > > > > > segment. It doesn’t update any transaction-related state for >> > > > > > > Diskless >> > > > > > > topics, because it doesn’t have any. >> > > > > > > 5. The WAL segment is committed to the batch coordinator like in >> > > > > > > the >> > > > > > > normal produce flow. >> > > > > > > 6. The batch coordinator does the final transactional checks of >> > > > > > > the >> > > > > > > batches. This procedure would output the same errors like the >> > > > partition >> > > > > > > leader in classic topics would do. I.e. some batches could be >> > > > rejected. >> > > > > > > This means, there will potentially be garbage in the WAL segment >> > > > file in >> > > > > > > case of transactional errors. This is preferable to doing more >> > > > network >> > > > > > > round trips, especially considering the WAL segments will be >> > > > relatively >> > > > > > > short-living (see the Greg's update above). >> > > > > > > >> > > > > > > When the transaction is committed or aborted: >> > > > > > > 1. The producer contacts the transaction coordinator directly >> > > > > > > with >> > > > > > > EndTxnRequest. >> > > > > > > 2. The transaction coordinator writes PREPARE_COMMIT or >> > > > PREPARE_ABORT to >> > > > > > > its log and responds to the producer. >> > > > > > > 3. *[NEW]* The transaction coordinator informs the batch >> > > > > > > coordinator >> > > > that >> > > > > > > the transaction is finished. >> > > > > > > 4. *[NEW]* The batch coordinator saves that the transaction is >> > > > finished >> > > > > > > and also inserts the control batches in the corresponding logs >> > > > > > > of the >> > > > > > > involved Diskless topics. This happens only on the metadata >> > > > > > > level, no >> > > > > > > actual control batches are written to any file. They will be >> > > > dynamically >> > > > > > > created on Fetch and other read operations. We could technically >> > > > write >> > > > > > > these control batches for real, but this would mean extra produce >> > > > > > latency, >> > > > > > > so it's better just to mark them in the batch coordinator and >> > > > > > > save >> > > > these >> > > > > > > milliseconds. >> > > > > > > 5. The transaction coordinator sends WriteTxnMarkersRequest to >> > > > > > > the >> > > > > > leaders >> > > > > > > of the involved partitions. – Now only to classic topics now. >> > > > > > > 6. The partition leaders of classic topics write the transaction >> > > > markers >> > > > > > > to their logs and respond to the coordinator. >> > > > > > > 7. The coordinator writes the final transaction state >> > > > COMPLETE_COMMIT or >> > > > > > > COMPLETE_ABORT. >> > > > > > > >> > > > > > > Compared to the non-transactional produce flow, we get: >> > > > > > > 1. An extra network round trip between brokers and the batch >> > > > coordinator >> > > > > > > when a new partition appear in the transaction. To mitigate the >> > > > impact of >> > > > > > > them: >> > > > > > > - The results will be cached. >> > > > > > > - The calls for multiple partitions in one Produce request >> > > > > > > will be >> > > > > > > grouped. >> > > > > > > - The batch coordinator should be optimized for fast response >> > > > > > > to >> > > > these >> > > > > > > RPCs. >> > > > > > > - The fact that a single producer normally will communicate >> > > > > > > with a >> > > > > > > single broker for the duration of the transaction further >> > > > > > > reduces the >> > > > > > > expected number of round trips. >> > > > > > > 2. An extra round trip between the transaction coordinator and >> > > > > > > batch >> > > > > > > coordinator when a transaction is finished. >> > > > > > > >> > > > > > > With this proposal, transactions will also be able to span both >> > > > classic >> > > > > > > and Diskless topics. >> > > > > > > >> > > > > > > *Queues* >> > > > > > > >> > > > > > > The share group coordination and management is a side job that >> > > > doesn't >> > > > > > > interfere with the topic itself (leadership, replicas, physical >> > > > storage >> > > > > > of >> > > > > > > records, etc.) and non-queue producers and consumers (Fetch and >> > > > Produce >> > > > > > > RPCs, consumer group-related RPCs are not affected.) We don't >> > > > > > > see any >> > > > > > > reason why we can't make Diskless topics compatible with share >> > > > groups the >> > > > > > > same way as classic topics are. Even on the code level, we don't >> > > > expect >> > > > > > any >> > > > > > > serious refactoring: the same reading routines are used that are >> > > > used for >> > > > > > > fetching (e.g. ReplicaManager.readFromLog). >> > > > > > > >> > > > > > > >> > > > > > > Should the KIPs be modified to include this or it's too >> > > > > > > implementation-focused? >> > > > > > > >> > > > > > > Best regards, >> > > > > > > Ivan >> > > > > > > >> > > > > > > On Wed, Sep 3, 2025, at 21:59, Greg Harris wrote: >> > > > > > > > Hi all, >> > > > > > > > >> > > > > > > > Thank you all for your questions and design input on KIP-1150. >> > > > > > > > >> > > > > > > > We have just updated KIP-1150 and KIP-1163 with a new design. >> > > > > > > > To >> > > > > > > summarize >> > > > > > > > the changes: >> > > > > > > > >> > > > > > > > 1. The design prioritizes integrating with the existing KIP-405 >> > > > Tiered >> > > > > > > > Storage interfaces, permitting data produced to a Diskless >> > > > > > > > topic >> > > > to be >> > > > > > > > moved to tiered storage. >> > > > > > > > This lowers the scalability requirements for the Batch >> > > > > > > > Coordinator >> > > > > > > > component, and allows Diskless to compose with Tiered Storage >> > > > plugin >> > > > > > > > features such as encryption and alternative data formats. >> > > > > > > > >> > > > > > > > 2. Consumer fetches are now served from local segments, making >> > > > > > > > use >> > > > of >> > > > > > the >> > > > > > > > indexes, page cache, request purgatory, and zero-copy >> > > > > > > > functionality >> > > > > > > already >> > > > > > > > built into classic topics. >> > > > > > > > However, local segments are now considered cache elements, do >> > > > > > > > not >> > > > need >> > > > > > to >> > > > > > > > be durably stored, and can be built without contacting any >> > > > > > > > other >> > > > > > > replicas. >> > > > > > > > >> > > > > > > > 3. The design has been simplified substantially, by removing >> > > > > > > > the >> > > > > > previous >> > > > > > > > Diskless consume flow, distributed cache component, and "object >> > > > > > > > compaction/merging" step. >> > > > > > > > >> > > > > > > > The design maintains leaderless produces as enabled by the >> > > > > > > > Batch >> > > > > > > > Coordinator, and the same latency profiles as the earlier >> > > > > > > > design, >> > > > while >> > > > > > > > being simpler and integrating better into the existing >> > > > > > > > ecosystem. >> > > > > > > > >> > > > > > > > Thanks, and we are eager to hear your feedback on the new >> > > > > > > > design. >> > > > > > > > Greg Harris >> > > > > > > > >> > > > > > > > On Mon, Jul 21, 2025 at 3:30 PM Jun Rao >> > > > > > > > <[email protected]> >> > > > > > > wrote: >> > > > > > > > >> > > > > > > > > Hi, Jan, >> > > > > > > > > >> > > > > > > > > For me, the main gap of KIP-1150 is the support of all >> > > > > > > > > existing >> > > > > > client >> > > > > > > > > APIs. Currently, there is no design for supporting APIs like >> > > > > > > transactions >> > > > > > > > > and queues. >> > > > > > > > > >> > > > > > > > > Thanks, >> > > > > > > > > >> > > > > > > > > Jun >> > > > > > > > > >> > > > > > > > > On Mon, Jul 21, 2025 at 3:53 AM Jan Siekierski >> > > > > > > > > <[email protected]> wrote: >> > > > > > > > > >> > > > > > > > > > Would it be a good time to ask for the current status of >> > > > > > > > > > this >> > > > KIP? >> > > > > > I >> > > > > > > > > > haven't seen much activity here for the past 2 months, the >> > > > vote got >> > > > > > > > > vetoed >> > > > > > > > > > but I think the pending questions have been answered since >> > > > then. >> > > > > > > KIP-1183 >> > > > > > > > > > (AutoMQ's proposal) also didn't have any activity since >> > > > > > > > > > May. >> > > > > > > > > > >> > > > > > > > > > In my eyes KIP-1150 and KIP-1183 are two real choices that >> > > > > > > > > > can >> > > > be >> > > > > > > > > > made, with a coordinator-based approach being by far the >> > > > dominant >> > > > > > one >> > > > > > > > > when >> > > > > > > > > > it comes to market adoption - but all these are standalone >> > > > > > products. >> > > > > > > > > > >> > > > > > > > > > I'm a big fan of both approaches, but would hate to see a >> > > > stall. So >> > > > > > > the >> > > > > > > > > > question is: can we get an update? >> > > > > > > > > > >> > > > > > > > > > Maybe it's time to start another vote? Colin McCabe - have >> > > > > > > > > > your >> > > > > > > questions >> > > > > > > > > > been answered? If not, is there anything I can do to help? >> > > > > > > > > > I'm >> > > > > > deeply >> > > > > > > > > > familiar with both architectures and have written about >> > > > > > > > > > both? >> > > > > > > > > > >> > > > > > > > > > Kind regards, >> > > > > > > > > > Jan >> > > > > > > > > > >> > > > > > > > > > On Tue, Jun 24, 2025 at 10:42 AM Stanislav Kozlovski < >> > > > > > > > > > [email protected]> wrote: >> > > > > > > > > > >> > > > > > > > > > > I have some nits - it may be useful to >> > > > > > > > > > > >> > > > > > > > > > > a) group all the KIP email threads in the main one (just >> > > > > > > > > > > a >> > > > bunch >> > > > > > of >> > > > > > > > > links >> > > > > > > > > > > to everything) >> > > > > > > > > > > b) create the email threads >> > > > > > > > > > > >> > > > > > > > > > > It's a bit hard to track it all - for example, I was >> > > > searching >> > > > > > for >> > > > > > > a >> > > > > > > > > > > discuss thread for KIP-1165 for a while; As far as I can >> > > > tell, it >> > > > > > > > > doesn't >> > > > > > > > > > > exist yet. >> > > > > > > > > > > >> > > > > > > > > > > Since the KIPs are published (by virtue of having the >> > > > > > > > > > > root >> > > > KIP be >> > > > > > > > > > > published, having a DISCUSS thread and links to sub-KIPs >> > > > where >> > > > > > were >> > > > > > > > > aimed >> > > > > > > > > > > to move the discussion towards), I think it would be >> > > > > > > > > > > good to >> > > > > > create >> > > > > > > > > > DISCUSS >> > > > > > > > > > > threads for them all. >> > > > > > > > > > > >> > > > > > > > > > > Best, >> > > > > > > > > > > Stan >> > > > > > > > > > > >> > > > > > > > > > > On 2025/04/16 11:58:22 Josep Prat wrote: >> > > > > > > > > > > > Hi Kafka Devs! >> > > > > > > > > > > > >> > > > > > > > > > > > We want to start a new KIP discussion about >> > > > > > > > > > > > introducing a >> > > > new >> > > > > > > type of >> > > > > > > > > > > > topics that would make use of Object Storage as the >> > > > > > > > > > > > primary >> > > > > > > source of >> > > > > > > > > > > > storage. However, as this KIP is big we decided to >> > > > > > > > > > > > split it >> > > > > > into >> > > > > > > > > > multiple >> > > > > > > > > > > > related KIPs. >> > > > > > > > > > > > We have the motivational KIP-1150 ( >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > >> > > > > > >> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1150%3A+Diskless+Topics >> > > > > > > > > > > ) >> > > > > > > > > > > > that aims to discuss if Apache Kafka should aim to have >> > > > this >> > > > > > > type of >> > > > > > > > > > > > feature at all. This KIP doesn't go onto details on >> > > > > > > > > > > > how to >> > > > > > > implement >> > > > > > > > > > it. >> > > > > > > > > > > > This follows the same approach used when we discussed >> > > > KRaft. >> > > > > > > > > > > > >> > > > > > > > > > > > But as we know that it is sometimes really hard to >> > > > > > > > > > > > discuss >> > > > on >> > > > > > > that >> > > > > > > > > meta >> > > > > > > > > > > > level, we also created several sub-kips (linked in >> > > > KIP-1150) >> > > > > > that >> > > > > > > > > offer >> > > > > > > > > > > an >> > > > > > > > > > > > implementation of this feature. >> > > > > > > > > > > > >> > > > > > > > > > > > We kindly ask you to use the proper DISCUSS threads for >> > > > each >> > > > > > > type of >> > > > > > > > > > > > concern and keep this one to discuss whether Apache >> > > > > > > > > > > > Kafka >> > > > wants >> > > > > > > to >> > > > > > > > > have >> > > > > > > > > > > > this feature or not. >> > > > > > > > > > > > >> > > > > > > > > > > > Thanks in advance on behalf of all the authors of this >> > > > > > > > > > > > KIP. >> > > > > > > > > > > > >> > > > > > > > > > > > ------------------ >> > > > > > > > > > > > Josep Prat >> > > > > > > > > > > > Open Source Engineering Director, Aiven >> > > > > > > > > > > > [email protected] | +491715557497 | aiven.io >> > > > > > > > > > > > Aiven Deutschland GmbH >> > > > > > > > > > > > Alexanderufer 3-7, 10117 Berlin >> > > > > > > > > > > > Geschäftsführer: Oskari Saarenmaa, Hannu Valtonen, >> > > > > > > > > > > > Anna Richardson, Kenneth Chen >> > > > > > > > > > > > Amtsgericht Charlottenburg, HRB 209739 B >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> >
