hi Jun The most important part of this story is how users should expect the data they can see when using the latest or by_duration policy with expanded partitions.
Yes, the by_duration policy can minimize data loss, but it is non-deterministic, which means users will either read too many historical records from existing partitions or lose some records from expanded partitions. Also, I agree that auto.offset.reset.max.age.ms is a bit hard to understand, and that is why I preferred having a whole new policy based entirely on group creation time (KIP-1282) Best, Chia-Ping Jun Rao via dev <[email protected]> 於 2026年6月16日週二 上午1:08寫道: > Hi, Chia-Ping and Jiunn-Yang, > > Thanks for the reply. I am still trying to understand the value of the new > configs with the KIP. > > The motivation of the KIP is that a user doesn't want to miss the data if > the backlog is small. The backlog of the existing partition is easy to > understand because it relates to retention time. The backlog for the new > partition is a bit subtle to understand since it depends on the metadata > refresh delay. To set auto.offset.reset.max.age.ms, the user needs to > understand the metadata refresh delay on the consumer side and use it to > set the config. > > Now, let's consider the alternative: setting the same value for the > existing by_duration policy. The KIP lists three issues with this approach. > 1. It computes the seek target client-side as now() - duration, which > introduces clock skew across consumers and forces operators to choose > overly large durations, causing unnecessary reprocessing. > 2. The target timestamp is recomputed on each retry, so failed > ListOffsetsRequest retries can shift the target forward and potentially > miss records produced between attempts. > 3. It applies uniformly to all partitions without committed offsets, and > cannot distinguish newly expanded partitions from long-existing partitions > newly assigned to the group, leading to unnecessary replay. > > Issues 1 and 2 are uncommon and can be mitigated by adding a bit buffer to > the metadata refresh delay. We could also consider improving the > implementation. For issue 3, the metadata refresh delay is typically low > (in the order of minutes with the classic consumer and tens of seconds with > the new consumer). If a user is ok with reading that much backlog for new > partitions, it seems they will be ok doing the same for existing > partitions. > > So, instead of introducing a new config, could we just reuse the existing > config with better documentation and/or implementation? > > Jun > > > On Sat, Jun 13, 2026 at 12:19 AM 黃竣陽 <[email protected]> wrote: > > > Hello Jun, > > > > You're right that group creation time is the more intuitive answer at > > first glance, > > the KIP's own motivation talks about partitions that "predate the group" > > vs partitions > > "created during group runtime," which directly points to a > group-lifecycle > > classifier. > > I'd like to walk through why we landed on partition age, and the > > trade-offs we considered. > > > > We evaluated three candidate signals: > > > > 1. `by_duration:5secs` > > > > This covers the metadata blindness window, but has issues the KIP > > currently documents > > under "Why not use `by_duration`?": > > > > - Client-side `now() - duration` introduces clock skew across consumers. > > - `ListOffsets` retries shift the target forward, potentially missing > > records produced between > > attempts. > > - It applies uniformly to all partitions without committed offsets, > > including pre-existing partitions > > newly assigned to the group, causing unnecessary replay. > > > > 2. Group creation time as classifier > > > > This works cleanly when the consumer is actively running. Our concern > > is the idle / late-rejoin case: > > > > T=0: Group created. > > T=1..T=100: Consumer idle (down, disconnected, etc.). > > T=50: Partition added during the idle window. > > T=100: Consumer resumes. > > > > Under group creation time, the new partition is classified as new > > (`50 > 0`) and reset to `earliest`, replaying everything from T=50. > > But during `[T=1, T=100]`, base partitions also accumulated data that > > the consumer accepts as lost — that is precisely the contract of > > `auto.offset.reset=latest`. There is no principled reason to treat > > the new partition differently; both contain backlog accumulated during > > the same idle window. > > > > This aligns with the "backlog is backlog” principle you raised in > > the KIP-1282 thread: a `latest` user has tolerated some backlog on > > every other partition during the same idle period; forcing 0-backlog > > tolerance only on new partitions would be inconsistent with that > > tolerance. > > > > 3. Partition age vs threshold > > > > Partition age corresponds to the actual silent data loss window, > > the gap between partition creation and the consumer’s metadata > > refresh. Within this window, data loss is genuinely silent: the > > consumer had no opportunity to know about the partition. Outside this > > window, missing data reflects either: > > > > - (a) the user’s tolerated cost of running with idle consumers, or > > - (b) an operational issue to surface via monitoring, not via reset > policy. > > > > We did not choose partition age because it is more elegant than group > > creation time — we chose it because its failure mode (requires a > > threshold) is > > less invasive than the failure mode of group creation time (overrides > > user-stated > > `latest` intent during idle periods). > > > > Best Regards, > > Jiunn-Yang > > > > > Chia-Ping Tsai <[email protected]> 於 2026年6月13日 上午11:52 寫道: > > > > > > Hi Jun, > > > > > > Relying on both creation times will create an inconsistent scenario. A > > > consumer that lost all offsets due to a long sleep will seek to the > > > beginning for the partitions created later than the group. > > > > > > That is why we initially proposed KIP-1282 to fix the inconsistency > > using a > > > whole new policy. Since KIP-1282 couldn't reach a consensus, KIP-1327 > > goes > > > back to using flexible configurations to prevent users from falling > into > > > that pitfall. > > > > > > Best, Chia-Ping > > > > > > Jun Rao via dev <[email protected]> 於 2026年6月13日週六 上午6:49寫道: > > > > > >> Hi, Jiunn-Yang, > > >> > > >> Thanks for the reply and sorry for the late reply. > > >> > > >> JR1. The design of auto.offset.reset.max.age.ms still feels weird to > > me. > > >> It > > >> categorizes partitions as new or existing based on the partition > > creation > > >> time. Intuitively, the categorization should be based on the group > > creation > > >> time: all partitions existing when the group is created are existing > and > > >> all partitions created after the group creation are new partitions. > > >> > > >> Jun > > >> > > >> > > >> > > >> On Tue, Jun 9, 2026 at 8:51 AM 黃竣陽 <[email protected]> wrote: > > >> > > >>> Hi all, > > >>> > > >>> Manually bumping this thread. If there is no further > > >>> discussion, I will close the vote. > > >>> > > >>> Best Regards, > > >>> Jiunn-Yang > > >>> > > >>>> 黃竣陽 <[email protected]> 於 2026年6月1日 晚上7:16 寫道: > > >>>> > > >>>> Hello Jian, > > >>>> > > >>>> Thanks for your feedback, > > >>>> > > >>>> Agreed, partition expansion is a common operational task, not an > edge > > >>>> case. I've updated the Motivation section accordingly. > > >>>> > > >>>> Best Regards, > > >>>> Jiunn-Yang > > >>>> > > >>>>> jian fu <[email protected]> 於 2026年6月1日 下午5:49 寫道: > > >>>>> > > >>>>> Hi Jiunn-Yang: > > >>>>> > > >>>>> Thanks for the KIP. I think it would be useful to clarify that this > > >> is a > > >>>>> common scenario rather than an edge case, which further > demonstrates > > >> the > > >>>>> need for this optimization. For example: > > >>>>> A partition expansion is a common operational task in Kafka: To > > >> balance > > >>>>> resource utilization and cost, topics are typically created with a > > >>> moderate > > >>>>> default partition count. However, as traffic grows over time, it is > > >>> often > > >>>>> necessary to increase the number of partitions to accommodate the > > >> higher > > >>>>> workload. > > >>>>> > > >>>>> Regards > > >>>>> Jian > > >>>>> > > >>>>> 黃竣陽 <[email protected]> 于2026年5月30日周六 22:31写道: > > >>>>> > > >>>>>> Hello chia, > > >>>>>> > > >>>>>> Thanks for the comments, I have updated the KIP! > > >>>>>> > > >>>>>> Best Regards, > > >>>>>> Jiunn-Yang > > >>>>>> > > >>>>>>> Chia-Ping Tsai <[email protected]> 於 2026年5月30日 晚上8:29 寫道: > > >>>>>>> > > >>>>>>> Hi Jiunn-Yang, > > >>>>>>> > > >>>>>>> Would you mind removing the terms "hot" and "cold" when > describing > > >>>>>>> partitions in the KIP? I understand you are using them to > describe > > >> the > > >>>>>>> "freshness" or the users' need for the records, but applying > these > > >>> terms > > >>>>>> to > > >>>>>>> the partition itself feels a bit unnatural. > > >>>>>>> > > >>>>>>> After all, in this scenario, users don't really care whether a > > >>> partition > > >>>>>> is > > >>>>>>> newly expanded or not. Their only expectation is that they won't > > >>> silently > > >>>>>>> lose any live records produced to the topic during their active > > >>>>>> consumption. > > >>>>>>> > > >>>>>>> Best, Chia-Ping > > >>>>>>> > > >>>>>>> > > >>>>>>> > > >>>>>>> 黃竣陽 <[email protected]> 於 2026年5月30日週六 下午12:30寫道: > > >>>>>>> > > >>>>>>>> Hello Jun, > > >>>>>>>> > > >>>>>>>> Thanks for the feedback, I have updated the KIP motivation > > section. > > >>>>>>>> > > >>>>>>>> Best Regards, > > >>>>>>>> Jiunn-Yang > > >>>>>>>> > > >>>>>>>>> Jun Rao via dev <[email protected]> 於 2026年5月30日 凌晨1:12 寫道: > > >>>>>>>>> > > >>>>>>>>> Hi, Jiunn-Yang, > > >>>>>>>>> > > >>>>>>>>> Thanks for the reply. I think we need a stronger motivation for > > >> the > > >>>>>> KIP. > > >>>>>>>>> > > >>>>>>>>> The KIP says "The core insight is that not all partitions > without > > >> a > > >>>>>>>>> committed offset are the same. A newly expanded partition (hot) > > is > > >>>>>>>>> fundamentally different from a partition the consumer has never > > >> seen > > >>>>>>>>> because it predates the group (cold)." Why is the hot partition > > >>>>>>>>> fundamentally different from the cold? > > >>>>>>>>> > > >>>>>>>>> The KIP says "The existing by_duration policy is also > > insufficient > > >>>>>>>> because: > > >>>>>>>>> > > >>>>>>>>> - The calculated seek time (now() - duration) varies across > nodes > > >>> due > > >>>>>>>> to > > >>>>>>>>> clock skew. To be safe, users must set an overly large > duration, > > >>>>>>>> causing > > >>>>>>>>> unnecessary reprocessing. > > >>>>>>>>> - On network errors, the client recalculates the seek time on > > >> retry, > > >>>>>>>>> shifting the target timestamp forward and risking data loss." > > >>>>>>>>> > > >>>>>>>>> However, both of these situations are rare. If these issues > > >> persist, > > >>>>>> more > > >>>>>>>>> severe problems likely exist elsewhere. Rare situations don't > > >> need a > > >>>>>>>> common > > >>>>>>>>> solution. If users care about those rare situations, they can > > >>> implement > > >>>>>>>>> customized logic using > > >>>>>> ConsumerRebalanceListener.onPartitionsAssigned(). > > >>>>>>>>> > > >>>>>>>>> Jun > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> On Sun, May 17, 2026 at 6:50 AM 黃竣陽 <[email protected]> > wrote: > > >>>>>>>>> > > >>>>>>>>>> Hello chia, > > >>>>>>>>>> > > >>>>>>>>>> Thanks for the feedback, > > >>>>>>>>>> > > >>>>>>>>>>> If the creation time exists, the returned value should always > > be > > >>>>>>>> greater > > >>>>>>>>>> than or equal to zero, right? > > >>>>>>>>>> I have explicitly mentioned this in the KIP. > > >>>>>>>>>> > > >>>>>>>>>>>> New Old (MetadataResponse v0–13) positive any > > >>> field > > >>>>>>>>>> absent UnsupportedVersionException > > >>>>>>>>>> > > >>>>>>>>>> The earliest point at which we can detect the version mismatch > > is > > >>>>>> during > > >>>>>>>>>> the > > >>>>>>>>>> first metadata fetch after assignment, which occurs inside > > >> poll(). > > >>>>>>>>>> Therefore, the > > >>>>>>>>>> user would encounter an UnsupportedVersionException from > poll(). > > >>> I’ll > > >>>>>>>>>> clarify this in the KIP. > > >>>>>>>>>> > > >>>>>>>>>> Best Regards, > > >>>>>>>>>> Jiunn-Yang > > >>>>>>>>>> > > >>>>>>>>>>> Chia-Ping Tsai <[email protected]> 於 2026年5月17日 下午4:50 寫道: > > >>>>>>>>>>> > > >>>>>>>>>>> hi Jiunn > > >>>>>>>>>>> > > >>>>>>>>>>>> PartitionAgeMs (int64, default -1): The age of this > partition > > >> in > > >>>>>>>>>> milliseconds, computed server-side by the broker as > > >>>>>> broker_current_time > > >>>>>>>> - > > >>>>>>>>>> partition_creation_time. Returns -1 if the broker does not > > >> support > > >>>>>> this > > >>>>>>>>>> feature or the partition creation time is unknown. > > >>>>>>>>>>> > > >>>>>>>>>>> If the creation time exists, the returned value should always > > be > > >>>>>>>> greater > > >>>>>>>>>> than or equal to zero, right? > > >>>>>>>>>>> > > >>>>>>>>>>>> New Old (MetadataResponse v0–13) positive any > > >>> field > > >>>>>>>>>> absent UnsupportedVersionException > > >>>>>>>>>>> > > >>>>>>>>>>> Will user encounter UnsupportedVersionException when calling > > >>>>>> `poll()`? > > >>>>>>>>>>> > > >>>>>>>>>>> Best, > > >>>>>>>>>>> Chia-Ping > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> On 2026/05/16 04:30:49 黃竣陽 wrote: > > >>>>>>>>>>>> Hello Jun, chia, > > >>>>>>>>>>>> > > >>>>>>>>>>>> I've updated KIP-1327 with a design change based on the > > >>> discussion > > >>>>>>>>>>>> feedback. > > >>>>>>>>>>>> > > >>>>>>>>>>>> The updated design decouples the new-partition reset > behavior > > >>> from > > >>>>>>>>>>>> the base auto.offset.reset policy: > > >>>>>>>>>>>> > > >>>>>>>>>>>> - auto.offset.reset.max.age.ms now applies to all > > >>> auto.offset.reset > > >>>>>>>>>> values > > >>>>>>>>>>>> (latest, earliest, by_duration, none). > > >>>>>>>>>>>> - For new ("hot") partitions, the consumer resets to > > >>>>>>>>>> auto.offset.reset.new.partitions > > >>>>>>>>>>>> config setting > > >>>>>>>>>>>> - For existing ("cold") partitions, the base > auto.offset.reset > > >>>>>> policy > > >>>>>>>>>> continues > > >>>>>>>>>>>> to apply unchanged. > > >>>>>>>>>>>> - The new-partition reset behavior is represented by a > > separate > > >>>>>>>>>> internal config > > >>>>>>>>>>>> (auto.offset.reset.new.partitions, currently fixed to > > >> earliest). > > >>>>>> This > > >>>>>>>>>> decoupled design makes > > >>>>>>>>>>>> it straightforward to promote the behavior to a public > > >>> user-facing > > >>>>>>>>>> configuration in a future KIP. > > >>>>>>>>>>>> > > >>>>>>>>>>>> Best Regards, > > >>>>>>>>>>>> Jiunn-Yang > > >>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>>>> Chia-Ping Tsai <[email protected]> 於 2026年5月16日 清晨7:46 > 寫道: > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> hi Jun > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> I see what you mean now. The proposal from me is listed > > below: > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> 1) Add auto.offset.reset.new.partitions with a default > value > > >> of > > >>>>>>>>>> earliest. It fixes the data loss from both by_duration and > > >> latest, > > >>> and > > >>>>>>>> it > > >>>>>>>>>> does not change the logic of auto.offset.reset=earliest. > > >>>>>>>>>>>>> 2) Mark auto.offset.reset.new.partitions as an internal > > >>>>>>>>>> configuration. auto.offset.reset.new.partitions=earliest > > already > > >>>>>>>>>> addresses the issue, and we can discuss the use cases of other > > >>> values > > >>>>>>>> in a > > >>>>>>>>>> separate KIP. > > >>>>>>>>>>>>> 3) Both configs, auto.offset.reset.new.partitions and > > >>>>>>>>>> auto.offset.reset.latest.max.age.ms, will be applied to all > for > > >>>>>>>>>> consistency. > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> WDYT? > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> On 2026/05/15 20:53:20 Jun Rao via dev wrote: > > >>>>>>>>>>>>>> Hi, Chia-Ping, > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> Thanks for the reply. > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> 1. In the motivation section, the KIP says "When a Kafka > > >> topic > > >>> is > > >>>>>>>>>> expanded > > >>>>>>>>>>>>>> with new partitions, consumers using the latest auto > offset > > >>> reset > > >>>>>>>>>> policy > > >>>>>>>>>>>>>> will silently miss all records produced to those > partitions > > >>> before > > >>>>>>>> the > > >>>>>>>>>>>>>> consumer discovers them.". If a user sets > > >>>>>>>>>>>>>> auto.offset.reset=by_duration=1sec, the same record loss > > >> issue > > >>>>>> could > > >>>>>>>>>> also > > >>>>>>>>>>>>>> happen, right? > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> 2. I was thinking auto.offset.reset.new.partitions will > > take > > >>> the > > >>>>>>>> same > > >>>>>>>>>>>>>> values as auto.offset.reset. So a user could set it > > >>> by_duration if > > >>>>>>>>>> needed. > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> Jun > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> On Thu, May 14, 2026 at 4:06 PM Chia-Ping Tsai < > > >>>>>> [email protected] > > >>>>>>>>> > > >>>>>>>>>> wrote: > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> hi Jun > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> Thanks for the feedback. I might be missing something > > >>> important > > >>>>>>>> from > > >>>>>>>>>> your > > >>>>>>>>>>>>>>> suggestion, so please bear with me as I try to clarify > with > > >> a > > >>> few > > >>>>>>>>>> questions: > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> 1. Is there a strong use case for extending this logic to > > >>> other > > >>>>>>>> reset > > >>>>>>>>>>>>>>> policies? Unlike latest, policies like earliest or > > >> by_duration > > >>>>>>>> don't > > >>>>>>>>>> seem > > >>>>>>>>>>>>>>> to suffer from the same silent data loss issue when a > > >>> partition > > >>>>>> is > > >>>>>>>>>> expanded. > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> 2. What values would we expect users to configure for > > >>>>>>>>>>>>>>> auto.offset.reset.new.partitions? If they set it to > > >> earliest > > >>> or > > >>>>>>>>>> latest, > > >>>>>>>>>>>>>>> we might run into the exact same edge cases. For example, > > >> if a > > >>>>>>>>>> consumer is > > >>>>>>>>>>>>>>> offline for a while and a new partition is created during > > >> that > > >>>>>>>>>> downtime, > > >>>>>>>>>>>>>>> the user might actually want to skip to latest when > > >> resuming, > > >>>>>>>> rather > > >>>>>>>>>> than > > >>>>>>>>>>>>>>> reading from earliest just because the partition is > > >>> technically > > >>>>>>>>>> "new" to > > >>>>>>>>>>>>>>> the group. > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> This is exactly why we opted for introducing a max.age > > >>> threshold. > > >>>>>>>> It > > >>>>>>>>>> gives > > >>>>>>>>>>>>>>> users a time-bound way to define what is genuinely > > "hot/new" > > >>> and > > >>>>>>>>>> what is > > >>>>>>>>>>>>>>> just an old partition they haven't seen yet. > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> Best, > > >>>>>>>>>>>>>>> Chia-Ping > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> On 2026/05/14 20:48:09 Jun Rao via dev wrote: > > >>>>>>>>>>>>>>>> Hi, Jiunn-Yang, > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> Thanks for the KIP. > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> I find auto.offset.reset.latest.max.age a bit weird. It > > >> only > > >>>>>>>>>> applies when > > >>>>>>>>>>>>>>>> auto.offset.reset is latest. However, it seems that the > > >>>>>> motivation > > >>>>>>>>>>>>>>> equally > > >>>>>>>>>>>>>>>> applies when auto.offset.reset is set to other values > like > > >>>>>>>>>> by_duration. > > >>>>>>>>>>>>>>> The > > >>>>>>>>>>>>>>>> intention is that we want to have a separate way to > > control > > >>>>>> newly > > >>>>>>>>>> created > > >>>>>>>>>>>>>>>> partitions vs existing partitions when the group starts. > > >>> Have we > > >>>>>>>>>>>>>>> considered > > >>>>>>>>>>>>>>>> adding a new config like auto.offset.reset.new > > .partitions? > > >>> If > > >>>>>>>> this > > >>>>>>>>>> new > > >>>>>>>>>>>>>>>> config is not set, the offset reset policy defaults to > the > > >>>>>> policy > > >>>>>>>>>> used > > >>>>>>>>>>>>>>> for > > >>>>>>>>>>>>>>>> existing partitions. The user could set it explicitly to > > >>>>>> customize > > >>>>>>>>>> the > > >>>>>>>>>>>>>>>> behavior for new partitions. > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> Jun > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> On Thu, May 7, 2026 at 5:07 AM 黃竣陽 <[email protected]> > > >>> wrote: > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> Hi all, > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> I’d like to manually bump this thread. > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> Best Regards, > > >>>>>>>>>>>>>>>>> Jiunn-Yang > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> 黃竣陽 <[email protected]> 於 2026年5月1日 晚上10:37 寫道: > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> Hello all, > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> Thanks for the feedback. > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> DJ01/DJ02: > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> MetadataResponse bumps from v13 to v14. The > > >>> PartitionMetadata > > >>>>>>>>>> struct > > >>>>>>>>>>>>>>>>> gains a new > > >>>>>>>>>>>>>>>>>> field PartitionAgeMs (int64, default -1), computed > > >>> server-side > > >>>>>>>> by > > >>>>>>>>>> the > > >>>>>>>>>>>>>>>>> broker as > > >>>>>>>>>>>>>>>>>> broker_current_time - partition_creation_time. > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> Also add the consumer heartbeat flow. when > > >>> MembershipManager > > >>>>>>>>>> detects > > >>>>>>>>>>>>>>> a > > >>>>>>>>>>>>>>>>> newly assigned > > >>>>>>>>>>>>>>>>>> partition, it explicitly invalidates the metadata for > > the > > >>>>>>>> affected > > >>>>>>>>>>>>>>> topic > > >>>>>>>>>>>>>>>>> and forces a fresh MetadataRequest > > >>>>>>>>>>>>>>>>>> before making the offset reset decision, even if the > > >> topic > > >>> ID > > >>>>>> is > > >>>>>>>>>>>>>>> already > > >>>>>>>>>>>>>>>>> in the cache. > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> MB0: > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> The consumer learns the broker's maximum supported > > >>>>>>>>>> MetadataResponse > > >>>>>>>>>>>>>>>>> version via the > > >>>>>>>>>>>>>>>>>> ApiVersions negotiation at connection time. If the > > >>> negotiated > > >>>>>>>>>>>>>>> version is > > >>>>>>>>>>>>>>>>> unsupported, the consumer > > >>>>>>>>>>>>>>>>>> knows the broker does not support PartitionAgeMs at > all > > >> and > > >>>>>> can > > >>>>>>>>>>>>>>> throw an > > >>>>>>>>>>>>>>>>> UnsupportedVersionException > > >>>>>>>>>>>>>>>>>> immediately, rather than silently falling back to > latest > > >>> and > > >>>>>>>>>> risking > > >>>>>>>>>>>>>>>>> data loss without any operator-visible signal. > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> MB1/MB2/MB3: > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> I have addressed these changes in the KIP. > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> Best Regards, > > >>>>>>>>>>>>>>>>>> Jiunn-Yang > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>> Chia-Ping Tsai <[email protected]> 於 2026年4月29日 > > 下午4:04 > > >>> 寫道: > > >>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>> hi David > > >>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>> I agree with the direction of moving the 'age' > > >> resolution > > >>>>>> from > > >>>>>>>>>> the > > >>>>>>>>>>>>>>>>> Heartbeat API to the Metadata API to keep the control > > >> plane > > >>>>>>>> clean. > > >>>>>>>>>> The > > >>>>>>>>>>>>>>> main > > >>>>>>>>>>>>>>>>> trade-off, as we noted before, is introducing > > inter-broker > > >>>>>> clock > > >>>>>>>>>> skew. > > >>>>>>>>>>>>>>> The > > >>>>>>>>>>>>>>>>> Group Coordinator approach provided a single source of > > >> truth > > >>>>>> for > > >>>>>>>>>> time. > > >>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>> However, realistically, this time skew should be > > >>> negligible. > > >>>>>>>>>> Given > > >>>>>>>>>>>>>>> that > > >>>>>>>>>>>>>>>>> the max.age threshold will likely be configured in > > minutes > > >>> or > > >>>>>>>>>> hours, a > > >>>>>>>>>>>>>>>>> typical NTP skew (in milliseconds) between brokers > won't > > >>> impact > > >>>>>>>> the > > >>>>>>>>>>>>>>>>> fallback decision. > > >>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>> Best, > > >>>>>>>>>>>>>>>>>>> Chia-Ping > > >>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>> David Jacot via dev <[email protected]> 於 > > >> 2026年4月29日 > > >>>>>>>> 下午3:29 > > >>>>>>>>>> 寫道: > > >>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>> Hi all, > > >>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>> Thanks for the KIP! > > >>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>> Sorry, I haven't really followed the previous > > >>> conversation > > >>>>>>>> but I > > >>>>>>>>>>>>>>> took a > > >>>>>>>>>>>>>>>>>>>> quick look at this one. > > >>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>> DJ01: I don't clearly understand the flow with the > > >>>>>>>>>>>>>>>>> ConsumerGroupHeartbeat > > >>>>>>>>>>>>>>>>>>>> API after reading the KIP. There is a new boolean; > the > > >>> KIP > > >>>>>>>>>> states > > >>>>>>>>>>>>>>> that > > >>>>>>>>>>>>>>>>>>>> partition ages are returned only when this boolean > is > > >>> set. > > >>>>>>>>>>>>>>> Implicitly, > > >>>>>>>>>>>>>>>>> this > > >>>>>>>>>>>>>>>>>>>> means that when the consumer receives a new > partition, > > >> it > > >>>>>> will > > >>>>>>>>>>>>>>> issue a > > >>>>>>>>>>>>>>>>> new > > >>>>>>>>>>>>>>>>>>>> HB request with the boolean set to receive the ages. > > Is > > >>> my > > >>>>>>>>>>>>>>>>> understanding > > >>>>>>>>>>>>>>>>>>>> correct? We should perhaps clarify the flow and also > > >>> explain > > >>>>>>>>>> how it > > >>>>>>>>>>>>>>>>> fits > > >>>>>>>>>>>>>>>>>>>> into the existing flow (e.g. list offsets, fetch > > >> offsets, > > >>>>>>>> etc.). > > >>>>>>>>>>>>>>>>>>>> DJ02: It my understanding is correct, I wonder if > > >>>>>>>>>>>>>>>>>>>> the ConsumerGroupHeartbeat API is the right place > for > > >>> this > > >>>>>>>> given > > >>>>>>>>>>>>>>> that > > >>>>>>>>>>>>>>>>> a new > > >>>>>>>>>>>>>>>>>>>> round trip is done anyway. Alternatively, it could > > >> simply > > >>>>>>>>>> include > > >>>>>>>>>>>>>>> the > > >>>>>>>>>>>>>>>>>>>> metadata. Generally, we should be rather cautious > > about > > >>> not > > >>>>>>>>>>>>>>> overloading > > >>>>>>>>>>>>>>>>>>>> the ConsumerGroupHeartbeat API with unrelated > > concepts. > > >>> The > > >>>>>>>> API > > >>>>>>>>>> is > > >>>>>>>>>>>>>>> a > > >>>>>>>>>>>>>>>>>>>> control plane API for assigning or revoking > > partitions. > > >>> The > > >>>>>>>> fact > > >>>>>>>>>>>>>>> that > > >>>>>>>>>>>>>>>>> we > > >>>>>>>>>>>>>>>>>>>> don't want to add it to the corresponding Streams > API > > >>> also > > >>>>>>>>>> suggests > > >>>>>>>>>>>>>>>>>>>> something is not quite right. What would we do if we > > >>> want to > > >>>>>>>>>>>>>>> support > > >>>>>>>>>>>>>>>>>>>> Streams in the future? > > >>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>> Best, > > >>>>>>>>>>>>>>>>>>>> David > > >>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>> On Wed, Apr 29, 2026 at 12:28 AM Muralidhar Basani > > via > > >>> dev > > >>>>>> < > > >>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: > > >>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>> Hi Jiunn, > > >>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>> Thank you for this great kip. Good to know about > the > > >>> gap. > > >>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>> mb-0 - why a new v2 version bump for > > >>> RequestPartitionAges > > >>>>>>>>>> field. > > >>>>>>>>>>>>>>> Can a > > >>>>>>>>>>>>>>>>>>>>> tagged field (for ex: on response, PartitionAges on > > >>>>>>>>>>>>>>> TopicPartitions) > > >>>>>>>>>>>>>>>>> be > > >>>>>>>>>>>>>>>>>>>>> used here and avoid version bump? > > >>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>> mb-1 - For the new config, is there a recommended > > >> value > > >>> or > > >>>>>> a > > >>>>>>>>>>>>>>> ConfigDef > > >>>>>>>>>>>>>>>>>>>>> validator? Probably it should based on the > > >>>>>>>> metadata.max.age.ms > > >>>>>>>>>> ? > > >>>>>>>>>>>>>>>>> Sizing > > >>>>>>>>>>>>>>>>>>>>> instructions can be part of javadocs I guess. > > >>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>> mb-2 - (minor) As there are no changes to Kafka > > >> Streams, > > >>>>>>>> would > > >>>>>>>>>> it > > >>>>>>>>>>>>>>> be > > >>>>>>>>>>>>>>>>> better > > >>>>>>>>>>>>>>>>>>>>> to add this new config > > >> auto.offset.reset.latest.max.age > > >>> to > > >>>>>>>> the > > >>>>>>>>>>>>>>>>>>>>> StreamsConfig block list > > >>>>>>>>>>>>>>> (NON_CONFIGURABLE_CONSUMER_DEFAULT_CONFIGS) > > >>>>>>>>>>>>>>>>> for a > > >>>>>>>>>>>>>>>>>>>>> clear warning, incase users configure it? This is > the > > >>> most > > >>>>>>>>>>>>>>> familiar > > >>>>>>>>>>>>>>>>>>>>> consumer config and users might easily mistakenly > > >>> configure > > >>>>>>>>>> it. Or > > >>>>>>>>>>>>>>>>> may be > > >>>>>>>>>>>>>>>>>>>>> it's not worth it to add. > > >>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>> mb-3 - (minor) The phrasing "the consumer falls > back > > >> to > > >>>>>>>>>> earliest" > > >>>>>>>>>>>>>>>>> reads as > > >>>>>>>>>>>>>>>>>>>>> if the config were being changed per-partition > which > > >>> isn't > > >>>>>>>>>>>>>>> supported. > > >>>>>>>>>>>>>>>>> May > > >>>>>>>>>>>>>>>>>>>>> be rephrasing to something like "consumer resolves > > the > > >>>>>>>> initial > > >>>>>>>>>>>>>>>>> position to > > >>>>>>>>>>>>>>>>>>>>> start offset for that partition" as if earliest was > > >>> applied > > >>>>>>>> to > > >>>>>>>>>>>>>>> that > > >>>>>>>>>>>>>>>>>>>>> partition only and auto.offset.reset config is > > >>> unchanged. > > >>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>> Thanks, > > >>>>>>>>>>>>>>>>>>>>> Murali > > >>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>> On Tue, Apr 28, 2026 at 2:48 PM 黃竣陽 < > > >>> [email protected]> > > >>>>>>>>>> wrote: > > >>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>> Hi chia, > > >>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>> I have updated the KIP to include this change. > > >>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>> Best Regards, > > >>>>>>>>>>>>>>>>>>>>>> Jiunn-Yang > > >>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>> Chia-Ping Tsai <[email protected]> 於 > 2026年4月28日 > > >>> 晚上8:03 > > >>>>>>>> 寫道: > > >>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>> hi Jiunn-Yang > > >>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>> chia_0: Should we expose the partition creation > > time > > >>> via > > >>>>>>>> the > > >>>>>>>>>>>>>>> Admin > > >>>>>>>>>>>>>>>>> API? > > >>>>>>>>>>>>>>>>>>>>>> I assume it would be valuable for users to > diagnose > > >> and > > >>>>>>>>>>>>>>> troubleshoot > > >>>>>>>>>>>>>>>>> the > > >>>>>>>>>>>>>>>>>>>>>> behavior of auto.offset.reset.latest.max.age > > >>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>> Best, > > >>>>>>>>>>>>>>>>>>>>>>> Chia-Ping > > >>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>> On 2026/04/28 10:47:58 黃竣陽 wrote: > > >>>>>>>>>>>>>>>>>>>>>>>> Hello everyone, > > >>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>> I would like to start a discussion on KIP-1327 > > >>> Prevent > > >>>>>> Hot > > >>>>>>>>>> Data > > >>>>>>>>>>>>>>>>> Loss > > >>>>>>>>>>>>>>>>>>>>> on > > >>>>>>>>>>>>>>>>>>>>>> Partition Expansion for Latest Policy > > >>>>>>>>>>>>>>>>>>>>>>>> < > > >>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>> > > >>>>>> > > >>> > > >> > > > https://urldefense.com/v3/__https://cwiki.apache.org/confluence/x/KY4mGQ__;!!Ayb5sqE7!qF4q1QzF1RRgP61D7A2xuEai1ky7fepKDKFFvpNBuePikH-ULmT87TvuuZzy5kau5E4y5zMZAmfQQiwZomM$ > > >>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>> This proposal aims to introduces > > >>>>>>>>>>>>>>> auto.offset.reset.latest.max.age, > > >>>>>>>>>>>>>>>>> a > > >>>>>>>>>>>>>>>>>>>>>> consumer config that lets the > > >>>>>>>>>>>>>>>>>>>>>>>> latest reset policy distinguish newly expanded > > >> (hot) > > >>>>>>>>>> partitions > > >>>>>>>>>>>>>>>>> from > > >>>>>>>>>>>>>>>>>>>>>> long-existing (cold) ones. Partitions > > >>>>>>>>>>>>>>>>>>>>>>>> younger than the configured threshold > > automatically > > >>> fall > > >>>>>>>>>> back > > >>>>>>>>>>>>>>> to > > >>>>>>>>>>>>>>>>>>>>>> earliest, preventing silent data loss > > >>>>>>>>>>>>>>>>>>>>>>>> during topic expansion without forcing a full > > >>> historical > > >>>>>>>>>>>>>>> reprocess. > > >>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>> Best regards, > > >>>>>>>>>>>>>>>>>>>>>>>> Jiunn-Yang > > >>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>> > > >>>>>>>> > > >>>>>> > > >>>>>> > > >>>> > > >>> > > >>> > > >> > > > > >
