Hi Jiunn-Yang: Thanks for the KIP. I think it would be useful to clarify that this is a common scenario rather than an edge case, which further demonstrates the need for this optimization. For example: A partition expansion is a common operational task in Kafka: To balance resource utilization and cost, topics are typically created with a moderate default partition count. However, as traffic grows over time, it is often necessary to increase the number of partitions to accommodate the higher workload.
Regards Jian 黃竣陽 <[email protected]> 于2026年5月30日周六 22:31写道: > Hello chia, > > Thanks for the comments, I have updated the KIP! > > Best Regards, > Jiunn-Yang > > > Chia-Ping Tsai <[email protected]> 於 2026年5月30日 晚上8:29 寫道: > > > > Hi Jiunn-Yang, > > > > Would you mind removing the terms "hot" and "cold" when describing > > partitions in the KIP? I understand you are using them to describe the > > "freshness" or the users' need for the records, but applying these terms > to > > the partition itself feels a bit unnatural. > > > > After all, in this scenario, users don't really care whether a partition > is > > newly expanded or not. Their only expectation is that they won't silently > > lose any live records produced to the topic during their active > consumption. > > > > Best, Chia-Ping > > > > > > > > 黃竣陽 <[email protected]> 於 2026年5月30日週六 下午12:30寫道: > > > >> Hello Jun, > >> > >> Thanks for the feedback, I have updated the KIP motivation section. > >> > >> Best Regards, > >> Jiunn-Yang > >> > >>> Jun Rao via dev <[email protected]> 於 2026年5月30日 凌晨1:12 寫道: > >>> > >>> Hi, Jiunn-Yang, > >>> > >>> Thanks for the reply. I think we need a stronger motivation for the > KIP. > >>> > >>> The KIP says "The core insight is that not all partitions without a > >>> committed offset are the same. A newly expanded partition (hot) is > >>> fundamentally different from a partition the consumer has never seen > >>> because it predates the group (cold)." Why is the hot partition > >>> fundamentally different from the cold? > >>> > >>> The KIP says "The existing by_duration policy is also insufficient > >> because: > >>> > >>> - The calculated seek time (now() - duration) varies across nodes due > >> to > >>> clock skew. To be safe, users must set an overly large duration, > >> causing > >>> unnecessary reprocessing. > >>> - On network errors, the client recalculates the seek time on retry, > >>> shifting the target timestamp forward and risking data loss." > >>> > >>> However, both of these situations are rare. If these issues persist, > more > >>> severe problems likely exist elsewhere. Rare situations don't need a > >> common > >>> solution. If users care about those rare situations, they can implement > >>> customized logic using > ConsumerRebalanceListener.onPartitionsAssigned(). > >>> > >>> Jun > >>> > >>> > >>> On Sun, May 17, 2026 at 6:50 AM 黃竣陽 <[email protected]> wrote: > >>> > >>>> Hello chia, > >>>> > >>>> Thanks for the feedback, > >>>> > >>>>> If the creation time exists, the returned value should always be > >> greater > >>>> than or equal to zero, right? > >>>> I have explicitly mentioned this in the KIP. > >>>> > >>>>>> New Old (MetadataResponse v0–13) positive any field > >>>> absent UnsupportedVersionException > >>>> > >>>> The earliest point at which we can detect the version mismatch is > during > >>>> the > >>>> first metadata fetch after assignment, which occurs inside poll(). > >>>> Therefore, the > >>>> user would encounter an UnsupportedVersionException from poll(). I’ll > >>>> clarify this in the KIP. > >>>> > >>>> Best Regards, > >>>> Jiunn-Yang > >>>> > >>>>> Chia-Ping Tsai <[email protected]> 於 2026年5月17日 下午4:50 寫道: > >>>>> > >>>>> hi Jiunn > >>>>> > >>>>>> PartitionAgeMs (int64, default -1): The age of this partition in > >>>> milliseconds, computed server-side by the broker as > broker_current_time > >> - > >>>> partition_creation_time. Returns -1 if the broker does not support > this > >>>> feature or the partition creation time is unknown. > >>>>> > >>>>> If the creation time exists, the returned value should always be > >> greater > >>>> than or equal to zero, right? > >>>>> > >>>>>> New Old (MetadataResponse v0–13) positive any field > >>>> absent UnsupportedVersionException > >>>>> > >>>>> Will user encounter UnsupportedVersionException when calling > `poll()`? > >>>>> > >>>>> Best, > >>>>> Chia-Ping > >>>>> > >>>>> > >>>>> On 2026/05/16 04:30:49 黃竣陽 wrote: > >>>>>> Hello Jun, chia, > >>>>>> > >>>>>> I've updated KIP-1327 with a design change based on the discussion > >>>>>> feedback. > >>>>>> > >>>>>> The updated design decouples the new-partition reset behavior from > >>>>>> the base auto.offset.reset policy: > >>>>>> > >>>>>> - auto.offset.reset.max.age.ms now applies to all auto.offset.reset > >>>> values > >>>>>> (latest, earliest, by_duration, none). > >>>>>> - For new ("hot") partitions, the consumer resets to > >>>> auto.offset.reset.new.partitions > >>>>>> config setting > >>>>>> - For existing ("cold") partitions, the base auto.offset.reset > policy > >>>> continues > >>>>>> to apply unchanged. > >>>>>> - The new-partition reset behavior is represented by a separate > >>>> internal config > >>>>>> (auto.offset.reset.new.partitions, currently fixed to earliest). > This > >>>> decoupled design makes > >>>>>> it straightforward to promote the behavior to a public user-facing > >>>> configuration in a future KIP. > >>>>>> > >>>>>> Best Regards, > >>>>>> Jiunn-Yang > >>>>>> > >>>>>> > >>>>>>> Chia-Ping Tsai <[email protected]> 於 2026年5月16日 清晨7:46 寫道: > >>>>>>> > >>>>>>> hi Jun > >>>>>>> > >>>>>>> I see what you mean now. The proposal from me is listed below: > >>>>>>> > >>>>>>> 1) Add auto.offset.reset.new.partitions with a default value of > >>>> earliest. It fixes the data loss from both by_duration and latest, and > >> it > >>>> does not change the logic of auto.offset.reset=earliest. > >>>>>>> 2) Mark auto.offset.reset.new.partitions as an internal > >>>> configuration. auto.offset.reset.new.partitions=earliest already > >>>> addresses the issue, and we can discuss the use cases of other values > >> in a > >>>> separate KIP. > >>>>>>> 3) Both configs, auto.offset.reset.new.partitions and > >>>> auto.offset.reset.latest.max.age.ms, will be applied to all for > >>>> consistency. > >>>>>>> > >>>>>>> WDYT? > >>>>>>> > >>>>>>> On 2026/05/15 20:53:20 Jun Rao via dev wrote: > >>>>>>>> Hi, Chia-Ping, > >>>>>>>> > >>>>>>>> Thanks for the reply. > >>>>>>>> > >>>>>>>> 1. In the motivation section, the KIP says "When a Kafka topic is > >>>> expanded > >>>>>>>> with new partitions, consumers using the latest auto offset reset > >>>> policy > >>>>>>>> will silently miss all records produced to those partitions before > >> the > >>>>>>>> consumer discovers them.". If a user sets > >>>>>>>> auto.offset.reset=by_duration=1sec, the same record loss issue > could > >>>> also > >>>>>>>> happen, right? > >>>>>>>> > >>>>>>>> 2. I was thinking auto.offset.reset.new.partitions will take the > >> same > >>>>>>>> values as auto.offset.reset. So a user could set it by_duration if > >>>> needed. > >>>>>>>> > >>>>>>>> Jun > >>>>>>>> > >>>>>>>> On Thu, May 14, 2026 at 4:06 PM Chia-Ping Tsai < > [email protected] > >>> > >>>> wrote: > >>>>>>>> > >>>>>>>>> hi Jun > >>>>>>>>> > >>>>>>>>> Thanks for the feedback. I might be missing something important > >> from > >>>> your > >>>>>>>>> suggestion, so please bear with me as I try to clarify with a few > >>>> questions: > >>>>>>>>> > >>>>>>>>> 1. Is there a strong use case for extending this logic to other > >> reset > >>>>>>>>> policies? Unlike latest, policies like earliest or by_duration > >> don't > >>>> seem > >>>>>>>>> to suffer from the same silent data loss issue when a partition > is > >>>> expanded. > >>>>>>>>> > >>>>>>>>> 2. What values would we expect users to configure for > >>>>>>>>> auto.offset.reset.new.partitions? If they set it to earliest or > >>>> latest, > >>>>>>>>> we might run into the exact same edge cases. For example, if a > >>>> consumer is > >>>>>>>>> offline for a while and a new partition is created during that > >>>> downtime, > >>>>>>>>> the user might actually want to skip to latest when resuming, > >> rather > >>>> than > >>>>>>>>> reading from earliest just because the partition is technically > >>>> "new" to > >>>>>>>>> the group. > >>>>>>>>> > >>>>>>>>> This is exactly why we opted for introducing a max.age threshold. > >> It > >>>> gives > >>>>>>>>> users a time-bound way to define what is genuinely "hot/new" and > >>>> what is > >>>>>>>>> just an old partition they haven't seen yet. > >>>>>>>>> > >>>>>>>>> Best, > >>>>>>>>> Chia-Ping > >>>>>>>>> > >>>>>>>>> On 2026/05/14 20:48:09 Jun Rao via dev wrote: > >>>>>>>>>> Hi, Jiunn-Yang, > >>>>>>>>>> > >>>>>>>>>> Thanks for the KIP. > >>>>>>>>>> > >>>>>>>>>> I find auto.offset.reset.latest.max.age a bit weird. It only > >>>> applies when > >>>>>>>>>> auto.offset.reset is latest. However, it seems that the > motivation > >>>>>>>>> equally > >>>>>>>>>> applies when auto.offset.reset is set to other values like > >>>> by_duration. > >>>>>>>>> The > >>>>>>>>>> intention is that we want to have a separate way to control > newly > >>>> created > >>>>>>>>>> partitions vs existing partitions when the group starts. Have we > >>>>>>>>> considered > >>>>>>>>>> adding a new config like auto.offset.reset.new.partitions? If > >> this > >>>> new > >>>>>>>>>> config is not set, the offset reset policy defaults to the > policy > >>>> used > >>>>>>>>> for > >>>>>>>>>> existing partitions. The user could set it explicitly to > customize > >>>> the > >>>>>>>>>> behavior for new partitions. > >>>>>>>>>> > >>>>>>>>>> Jun > >>>>>>>>>> > >>>>>>>>>> On Thu, May 7, 2026 at 5:07 AM 黃竣陽 <[email protected]> wrote: > >>>>>>>>>> > >>>>>>>>>>> Hi all, > >>>>>>>>>>> > >>>>>>>>>>> I’d like to manually bump this thread. > >>>>>>>>>>> > >>>>>>>>>>> Best Regards, > >>>>>>>>>>> Jiunn-Yang > >>>>>>>>>>> > >>>>>>>>>>>> 黃竣陽 <[email protected]> 於 2026年5月1日 晚上10:37 寫道: > >>>>>>>>>>>> > >>>>>>>>>>>> Hello all, > >>>>>>>>>>>> > >>>>>>>>>>>> Thanks for the feedback. > >>>>>>>>>>>> > >>>>>>>>>>>> DJ01/DJ02: > >>>>>>>>>>>> > >>>>>>>>>>>> MetadataResponse bumps from v13 to v14. The PartitionMetadata > >>>> struct > >>>>>>>>>>> gains a new > >>>>>>>>>>>> field PartitionAgeMs (int64, default -1), computed server-side > >> by > >>>> the > >>>>>>>>>>> broker as > >>>>>>>>>>>> broker_current_time - partition_creation_time. > >>>>>>>>>>>> > >>>>>>>>>>>> Also add the consumer heartbeat flow. when MembershipManager > >>>> detects > >>>>>>>>> a > >>>>>>>>>>> newly assigned > >>>>>>>>>>>> partition, it explicitly invalidates the metadata for the > >> affected > >>>>>>>>> topic > >>>>>>>>>>> and forces a fresh MetadataRequest > >>>>>>>>>>>> before making the offset reset decision, even if the topic ID > is > >>>>>>>>> already > >>>>>>>>>>> in the cache. > >>>>>>>>>>>> > >>>>>>>>>>>> MB0: > >>>>>>>>>>>> > >>>>>>>>>>>> The consumer learns the broker's maximum supported > >>>> MetadataResponse > >>>>>>>>>>> version via the > >>>>>>>>>>>> ApiVersions negotiation at connection time. If the negotiated > >>>>>>>>> version is > >>>>>>>>>>> unsupported, the consumer > >>>>>>>>>>>> knows the broker does not support PartitionAgeMs at all and > can > >>>>>>>>> throw an > >>>>>>>>>>> UnsupportedVersionException > >>>>>>>>>>>> immediately, rather than silently falling back to latest and > >>>> risking > >>>>>>>>>>> data loss without any operator-visible signal. > >>>>>>>>>>>> > >>>>>>>>>>>> MB1/MB2/MB3: > >>>>>>>>>>>> > >>>>>>>>>>>> I have addressed these changes in the KIP. > >>>>>>>>>>>> > >>>>>>>>>>>> Best Regards, > >>>>>>>>>>>> Jiunn-Yang > >>>>>>>>>>>> > >>>>>>>>>>>>> Chia-Ping Tsai <[email protected]> 於 2026年4月29日 下午4:04 寫道: > >>>>>>>>>>>>> > >>>>>>>>>>>>> hi David > >>>>>>>>>>>>> > >>>>>>>>>>>>> I agree with the direction of moving the 'age' resolution > from > >>>> the > >>>>>>>>>>> Heartbeat API to the Metadata API to keep the control plane > >> clean. > >>>> The > >>>>>>>>> main > >>>>>>>>>>> trade-off, as we noted before, is introducing inter-broker > clock > >>>> skew. > >>>>>>>>> The > >>>>>>>>>>> Group Coordinator approach provided a single source of truth > for > >>>> time. > >>>>>>>>>>>>> > >>>>>>>>>>>>> However, realistically, this time skew should be negligible. > >>>> Given > >>>>>>>>> that > >>>>>>>>>>> the max.age threshold will likely be configured in minutes or > >>>> hours, a > >>>>>>>>>>> typical NTP skew (in milliseconds) between brokers won't impact > >> the > >>>>>>>>>>> fallback decision. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Best, > >>>>>>>>>>>>> Chia-Ping > >>>>>>>>>>>>> > >>>>>>>>>>>>>> David Jacot via dev <[email protected]> 於 2026年4月29日 > >> 下午3:29 > >>>> 寫道: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Hi all, > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Thanks for the KIP! > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Sorry, I haven't really followed the previous conversation > >> but I > >>>>>>>>> took a > >>>>>>>>>>>>>> quick look at this one. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> DJ01: I don't clearly understand the flow with the > >>>>>>>>>>> ConsumerGroupHeartbeat > >>>>>>>>>>>>>> API after reading the KIP. There is a new boolean; the KIP > >>>> states > >>>>>>>>> that > >>>>>>>>>>>>>> partition ages are returned only when this boolean is set. > >>>>>>>>> Implicitly, > >>>>>>>>>>> this > >>>>>>>>>>>>>> means that when the consumer receives a new partition, it > will > >>>>>>>>> issue a > >>>>>>>>>>> new > >>>>>>>>>>>>>> HB request with the boolean set to receive the ages. Is my > >>>>>>>>>>> understanding > >>>>>>>>>>>>>> correct? We should perhaps clarify the flow and also explain > >>>> how it > >>>>>>>>>>> fits > >>>>>>>>>>>>>> into the existing flow (e.g. list offsets, fetch offsets, > >> etc.). > >>>>>>>>>>>>>> DJ02: It my understanding is correct, I wonder if > >>>>>>>>>>>>>> the ConsumerGroupHeartbeat API is the right place for this > >> given > >>>>>>>>> that > >>>>>>>>>>> a new > >>>>>>>>>>>>>> round trip is done anyway. Alternatively, it could simply > >>>> include > >>>>>>>>> the > >>>>>>>>>>>>>> metadata. Generally, we should be rather cautious about not > >>>>>>>>> overloading > >>>>>>>>>>>>>> the ConsumerGroupHeartbeat API with unrelated concepts. The > >> API > >>>> is > >>>>>>>>> a > >>>>>>>>>>>>>> control plane API for assigning or revoking partitions. The > >> fact > >>>>>>>>> that > >>>>>>>>>>> we > >>>>>>>>>>>>>> don't want to add it to the corresponding Streams API also > >>>> suggests > >>>>>>>>>>>>>> something is not quite right. What would we do if we want to > >>>>>>>>> support > >>>>>>>>>>>>>> Streams in the future? > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>> David > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> On Wed, Apr 29, 2026 at 12:28 AM Muralidhar Basani via dev > < > >>>>>>>>>>>>>>> [email protected]> wrote: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Hi Jiunn, > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Thank you for this great kip. Good to know about the gap. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> mb-0 - why a new v2 version bump for RequestPartitionAges > >>>> field. > >>>>>>>>> Can a > >>>>>>>>>>>>>>> tagged field (for ex: on response, PartitionAges on > >>>>>>>>> TopicPartitions) > >>>>>>>>>>> be > >>>>>>>>>>>>>>> used here and avoid version bump? > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> mb-1 - For the new config, is there a recommended value or > a > >>>>>>>>> ConfigDef > >>>>>>>>>>>>>>> validator? Probably it should based on the > >> metadata.max.age.ms > >>>> ? > >>>>>>>>>>> Sizing > >>>>>>>>>>>>>>> instructions can be part of javadocs I guess. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> mb-2 - (minor) As there are no changes to Kafka Streams, > >> would > >>>> it > >>>>>>>>> be > >>>>>>>>>>> better > >>>>>>>>>>>>>>> to add this new config auto.offset.reset.latest.max.age to > >> the > >>>>>>>>>>>>>>> StreamsConfig block list > >>>>>>>>> (NON_CONFIGURABLE_CONSUMER_DEFAULT_CONFIGS) > >>>>>>>>>>> for a > >>>>>>>>>>>>>>> clear warning, incase users configure it? This is the most > >>>>>>>>> familiar > >>>>>>>>>>>>>>> consumer config and users might easily mistakenly configure > >>>> it. Or > >>>>>>>>>>> may be > >>>>>>>>>>>>>>> it's not worth it to add. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> mb-3 - (minor) The phrasing "the consumer falls back to > >>>> earliest" > >>>>>>>>>>> reads as > >>>>>>>>>>>>>>> if the config were being changed per-partition which isn't > >>>>>>>>> supported. > >>>>>>>>>>> May > >>>>>>>>>>>>>>> be rephrasing to something like "consumer resolves the > >> initial > >>>>>>>>>>> position to > >>>>>>>>>>>>>>> start offset for that partition" as if earliest was applied > >> to > >>>>>>>>> that > >>>>>>>>>>>>>>> partition only and auto.offset.reset config is unchanged. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Thanks, > >>>>>>>>>>>>>>> Murali > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> On Tue, Apr 28, 2026 at 2:48 PM 黃竣陽 <[email protected]> > >>>> wrote: > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Hi chia, > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> I have updated the KIP to include this change. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Best Regards, > >>>>>>>>>>>>>>>> Jiunn-Yang > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Chia-Ping Tsai <[email protected]> 於 2026年4月28日 晚上8:03 > >> 寫道: > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> hi Jiunn-Yang > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> chia_0: Should we expose the partition creation time via > >> the > >>>>>>>>> Admin > >>>>>>>>>>> API? > >>>>>>>>>>>>>>>> I assume it would be valuable for users to diagnose and > >>>>>>>>> troubleshoot > >>>>>>>>>>> the > >>>>>>>>>>>>>>>> behavior of auto.offset.reset.latest.max.age > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>>>> Chia-Ping > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> On 2026/04/28 10:47:58 黃竣陽 wrote: > >>>>>>>>>>>>>>>>>> Hello everyone, > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> I would like to start a discussion on KIP-1327 Prevent > Hot > >>>> Data > >>>>>>>>>>> Loss > >>>>>>>>>>>>>>> on > >>>>>>>>>>>>>>>> Partition Expansion for Latest Policy > >>>>>>>>>>>>>>>>>> < > >>>>>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>> > >>>> > >> > https://urldefense.com/v3/__https://cwiki.apache.org/confluence/x/KY4mGQ__;!!Ayb5sqE7!qF4q1QzF1RRgP61D7A2xuEai1ky7fepKDKFFvpNBuePikH-ULmT87TvuuZzy5kau5E4y5zMZAmfQQiwZomM$ > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> This proposal aims to introduces > >>>>>>>>> auto.offset.reset.latest.max.age, > >>>>>>>>>>> a > >>>>>>>>>>>>>>>> consumer config that lets the > >>>>>>>>>>>>>>>>>> latest reset policy distinguish newly expanded (hot) > >>>> partitions > >>>>>>>>>>> from > >>>>>>>>>>>>>>>> long-existing (cold) ones. Partitions > >>>>>>>>>>>>>>>>>> younger than the configured threshold automatically fall > >>>> back > >>>>>>>>> to > >>>>>>>>>>>>>>>> earliest, preventing silent data loss > >>>>>>>>>>>>>>>>>> during topic expansion without forcing a full historical > >>>>>>>>> reprocess. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Best regards, > >>>>>>>>>>>>>>>>>> Jiunn-Yang > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>> > >>>>>> > >>>> > >>>> > >> > >> > >
