Hi De Gao, Thanks for the KIP!
I'd like to re-raise the concerns that David and Justine have made, especially the alternative of Tiered Storage and the increase in (client) metadata complexity. I don't think that the KIP contains a satisfactory explanation of why this change is worth it compared with using Tiered Storage as-is, or with marginal improvements as Kamal was suggesting. > The replicas serve two major function: accepting write and serve read. But if we observe a replica, we can see that most (not all, I must say) of the read and write happened only on the end of the replica. This part is complicated and need to handle with care. The majority part of the replica are immutable and just serve as a data store (most of the time). > But when we manage the replica we manage it as a single piece. Like when we want to move a replica to a new broker we need to move all the data in the replica although most of the case we might just interested with some data at the end. > What I am proposing is really to provide an capability to separate the concern between the data we mostly interested and also complicated to manage, with the data we know that are stable and immutable and very easy to manage. These statements could all be made in support for Tiered Storage, and don't differentiate Chunks and Tiered Storage. > If we have this in the first place we don't need tiered storage, as adding more brokers / disks will easily hold more data. I don't think this is a useful hypothetical, because Tiered Storage exists as a currently supported feature that is already merged, and there are no current plans to deprecate or remove it. You should plan for this feature to coexist with Tiered Storage, and identify the core value proposition in that situation. If you're interested in the benefits of Tiered Storage but don't want to depend on cloud infrastructure, there are self-hosted object storages. Or if you want Kafka brokers to not depend on an external service, you may choose to implement a new RemoteStorageManager with the properties you want. Thanks, Greg On Sat, Jan 25, 2025 at 12:55 PM De Gao <d...@live.co.uk> wrote: > Hi All: > > I have updated the KIP to be more specific on the motivation based on the > comments.Please review as you can. Appreciated. > If no more review to follow I will submit the KIP for vote. > Thank you! > > On 3 January 2025 22:36:06 GMT, De Gao <d...@live.co.uk> wrote: > >Thanks for the review. > >This is an interesting idea. Indeed this will significantly reduce the > data need to be copied. But this may need to take the TTL time to get the > new replica join the ISRs. Also we need to consider how to handle the > partitions that will only do purge by data size. > > > >On 2 January 2025 18:11:49 GMT, Kamal Chandraprakash < > kamal.chandraprak...@gmail.com> wrote: > >>Hi Deo, > >> > >>Thanks for the KIP! > >> > >>"However the limit of messages in a single partition replica is very big. > >>This could lead to very big partitions (~TBs). Moving those partitions > are > >>very time consuming and have a big impact on system performance." > >> > >>One way to do faster rebalance is to have a latest-offset replica build > >>strategy when expanding the replicas for a partition > >>and ensure that the expanded replica does not serve as a leader until the > >>data in the older nodes expires by retention time/size. > >>Currently, Kafka supports only the earliest-offset strategy during > >>reassignment. And, this strategy will only work for topics > >>with cleanup policy set to "delete". > >> > >>-- > >>Kamal > >> > >>On Thu, Jan 2, 2025 at 10:23 PM David Arthur <mum...@gmail.com> wrote: > >> > >>> Hey De Gao, thanks for the KIP! > >>> > >>> As you’re probably aware, a Partition is a logical construct in Kafka. > A > >>> broker hosts a partition which is composed of physical log segments. > Only > >>> the active segment is being written to and the others are immutable. > The > >>> concept of a Chunk sounds quite similar to our log segments. > >>> > >>> From what I can tell reading the KIP, the main difference is that a > Chunk > >>> can have its own assignment and therefore be replicated across > different > >>> brokers. > >>> > >>> > Horizontal scalability: the data was distributed more evenly to > brokers > >>> in cluster. Also achieving a more flexible resource allocation. > >>> > >>> I think this is only true in cases where we have a small number of > >>> partitions with a large amount of data. I have certainly seen cases > where a > >>> small number of partitions can cause trouble with balancing the > cluster. > >>> > >>> The idea of shuffling around older data in order to spread out the > load is > >>> interesting. It does seem like it would increase the complexity of the > >>> client a bit when it comes to consuming the old data. Usually the > client > >>> can just read from a single replica from the beginning of the log to > the > >>> end. With this proposal, the client would need to hop around between > >>> replicas as it crossed the chunk boundaries. > >>> > >>> > Better load balancing: The read of partition data, especially early > data > >>> can be distributed to more nodes other than just leader nodes. > >>> > >>> As you know, this is already possible with KIP-392. I guess the idea > with > >>> the chunks is that clients would be reading older data from less busy > >>> brokers (i.e., brokers which are not the leader, or perhaps not even a > >>> follower of the active chunk). I’m not sure this would always result in > >>> better load balancing. It seems a bit situational. > >>> > >>> > Increased fault tolerance: failure of leader node will not impact > read > >>> older data. > >>> > >>> I don’t think this proposal changes the fault tolerance. A failure of a > >>> leader results in a failover to a follower. If a client is consuming > using > >>> KIP-392, a leader failure will not affect the consumption (besides > updating > >>> the clients metadata). > >>> > >>> -- > >>> > >>> I guess I'm missing a key point here. What problem is this trying to > solve? > >>> Is it a solution for the "single partition" problem? (i.e., a topic > with > >>> one partition and a lot of data) > >>> > >>> Thanks! > >>> David A > >>> > >>> On Tue, Dec 31, 2024 at 3:24 PM De Gao <d...@live.co.uk> wrote: > >>> > >>> > Thanks for the comments. I have updated the proposal to compare with > >>> > tiered storage and fetch from replica. Please check. > >>> > > >>> > Thanks. > >>> > > >>> > On 11 December 2024 08:51:43 GMT, David Jacot > >>> <dja...@confluent.io.INVALID> > >>> > wrote: > >>> > >Hi, > >>> > > > >>> > >Thanks for the KIP. The community is pretty busy with the Apache > Kafka > >>> 4.0 > >>> > >release so I suppose that no one really had the time to engage in > >>> > reviewing > >>> > >the KIP yet. Sorry for this! > >>> > > > >>> > >I just read the motivation section. I think that it is an > interesting > >>> > idea. > >>> > >However, I wonder if this is still needed now that we have tier > storage > >>> in > >>> > >place. One of the big selling points of tier storage was that > clusters > >>> > >don't have to replicate tiered data anymore. Could you perhaps > extend > >>> the > >>> > >motivation of the KIP to include tier storage in the reflexion? > >>> > > > >>> > >Best, > >>> > >David > >>> > > > >>> > >On Tue, Dec 10, 2024 at 10:46 PM De Gao <d...@live.co.uk> wrote: > >>> > > > >>> > >> Hi All: > >>> > >> > >>> > >> There were no discussion in the past week. Just want to double > check > >>> if > >>> > I > >>> > >> missed anything? > >>> > >> What should be the expectations on KIP discussion? > >>> > >> > >>> > >> Thank you! > >>> > >> > >>> > >> De Gao > >>> > >> > >>> > >> On 1 December 2024 19:36:37 GMT, De Gao <d...@live.co.uk> wrote: > >>> > >> >Hi All: > >>> > >> > > >>> > >> >I would like to start the discussion of KIP-1114 Introducing > Chunk in > >>> > >> Partition. > >>> > >> > > >>> > >> > > >>> > >> > >>> > > >>> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1114%3A+Introducing+Chunk+in+Partition > >>> > >> >This KIP is complicated so I expect discussion will take longer > time. > >>> > >> > > >>> > >> >Thank you in advance. > >>> > >> > > >>> > >> >De Gao > >>> > >> > >>> > > >>> > >>> > >>> -- > >>> David Arthur > >>> >