Re: [DISCUSS] KIP-1114 Introducing Chunk in Partition

2025-02-18 Thread De Gao
Hi Greg: I see you are from a very practical point of view to evaluate this KIP. With tiered storage already merged in it is hard to accept another improvement that provide similar solution, even it is better by design and enable potential future growth. That being said. This is still a communi

Re: [DISCUSS] KIP-1114 Introducing Chunk in Partition

2025-02-18 Thread Greg Harris
Hi De Gao, Thanks for your explanation. It sounds like this feature is appropriate in situations where: 1. Low latency to change replicas is a high priority 2. External storage is unavailable or undesirable 3. Linear growth in metadata size is acceptable to clients, brokers, and controllers 4. Da

Re: [DISCUSS] KIP-1114 Introducing Chunk in Partition

2025-02-17 Thread De Gao
Hi Greg: Thank you very much for the review. Let's do more compares with tiered storage. I agree that the chunk has certain functional overlap with tiered storage. But they are designed from a different perspective. They both want to address the problem that overtime the partition data is too bi

Re: [DISCUSS] KIP-1114 Introducing Chunk in Partition

2025-02-16 Thread Greg Harris
Hi De Gao, Thanks for the KIP! I'd like to re-raise the concerns that David and Justine have made, especially the alternative of Tiered Storage and the increase in (client) metadata complexity. I don't think that the KIP contains a satisfactory explanation of why this change is worth it compared

Re: [DISCUSS] KIP-1114 Introducing Chunk in Partition

2025-01-25 Thread De Gao
Hi All: I have updated the KIP to be more specific on the motivation based on the comments.Please review as you can. Appreciated. If no more review to follow I will submit the KIP for vote. Thank you! On 3 January 2025 22:36:06 GMT, De Gao wrote: >Thanks for the review. >This is an interesting

Re: [DISCUSS] KIP-1114 Introducing Chunk in Partition

2025-01-03 Thread De Gao
Thanks for the review. This is an interesting idea. Indeed this will significantly reduce the data need to be copied. But this may need to take the TTL time to get the new replica join the ISRs. Also we need to consider how to handle the partitions that will only do purge by data size. On 2 Jan

Re: [DISCUSS] KIP-1114 Introducing Chunk in Partition

2025-01-03 Thread De Gao
Thanks David for the detailed review. Appreciated. I will try to answer your most important question: what is the problem I am trying to solve here. This KIP will surely resolve some technical problems. But that's not the original purpose. I am trying to resolve a problem in Kafka's design. T

Re: [DISCUSS] KIP-1114 Introducing Chunk in Partition

2025-01-02 Thread Kamal Chandraprakash
Hi Deo, Thanks for the KIP! "However the limit of messages in a single partition replica is very big. This could lead to very big partitions (~TBs). Moving those partitions are very time consuming and have a big impact on system performance." One way to do faster rebalance is to have a latest-of

Re: [DISCUSS] KIP-1114 Introducing Chunk in Partition

2025-01-02 Thread David Arthur
Hey De Gao, thanks for the KIP! As you’re probably aware, a Partition is a logical construct in Kafka. A broker hosts a partition which is composed of physical log segments. Only the active segment is being written to and the others are immutable. The concept of a Chunk sounds quite similar to our

Re: [DISCUSS] KIP-1114 Introducing Chunk in Partition

2024-12-31 Thread De Gao
Thanks for the comments. I have updated the proposal to compare with tiered storage and fetch from replica. Please check. Thanks. On 11 December 2024 08:51:43 GMT, David Jacot wrote: >Hi, > >Thanks for the KIP. The community is pretty busy with the Apache Kafka 4.0 >release so I suppose that n

Re: [DISCUSS] KIP-1114 Introducing Chunk in Partition

2024-12-11 Thread Justine Olshan
Hey there, Thanks for sharing this KIP. I took a brief look and had a similar thought as David about this with wondering how this compares to using tiered storage. I was also thinking about fetching from any replica (KIP-392) related to this as well. I think overall, Chunk could be useful, but it

Re: [DISCUSS] KIP-1114 Introducing Chunk in Partition

2024-12-11 Thread De Gao
Hi David: Thanks for the helping info. Appreciated. I believe the chunk is still needed in spite of tiered storage as this makes Kafka more 'complete'. Let me extend the motivation section. I wasn't aware we are busy on Kafka 4.0. If that's the case I will see if can contribute on that while wai

Re: [DISCUSS] KIP-1114 Introducing Chunk in Partition

2024-12-11 Thread David Jacot
Hi, Thanks for the KIP. The community is pretty busy with the Apache Kafka 4.0 release so I suppose that no one really had the time to engage in reviewing the KIP yet. Sorry for this! I just read the motivation section. I think that it is an interesting idea. However, I wonder if this is still ne

Re: [DISCUSS] KIP-1114 Introducing Chunk in Partition

2024-12-10 Thread De Gao
Hi All: There were no discussion in the past week. Just want to double check if I missed anything? What should be the expectations on KIP discussion? Thank you! De Gao On 1 December 2024 19:36:37 GMT, De Gao wrote: >Hi All: > >I would like to start the discussion of KIP-1114 Introducing Chunk

[DISCUSS] KIP-1114 Introducing Chunk in Partition

2024-12-01 Thread De Gao
Hi All: I would like to start the discussion of KIP-1114 Introducing Chunk in Partition. https://cwiki.apache.org/confluence/display/KAFKA/KIP-1114%3A+Introducing+Chunk+in+Partition This KIP is complicated so I expect discussion will take longer time. Thank you in advance. De Gao