Hi, Chia-Ping and Jiunn-Yang,

Thanks for the reply. I am still trying to understand the value of the new
configs with the KIP.

The motivation of the KIP is that a user doesn't want to miss the data if
the backlog is small. The backlog of the existing partition is easy to
understand because it relates to retention time. The backlog for the new
partition is a bit subtle to understand since it depends on the metadata
refresh delay. To set auto.offset.reset.max.age.ms, the user needs to
understand the metadata refresh delay on the consumer side and use it to
set the config.

Now, let's consider the alternative: setting the same value for the
existing by_duration policy. The KIP lists three issues with this approach.
1. It computes the seek target client-side as now() - duration, which
introduces clock skew across consumers and forces operators to choose
overly large durations, causing unnecessary reprocessing.
2. The target timestamp is recomputed on each retry, so failed
ListOffsetsRequest retries can shift the target forward and potentially
miss records produced between attempts.
3. It applies uniformly to all partitions without committed offsets, and
cannot distinguish newly expanded partitions from long-existing partitions
newly assigned to the group, leading to unnecessary replay.

Issues 1 and 2 are uncommon and can be mitigated by adding a bit buffer to
the metadata refresh delay. We could also consider improving the
implementation. For issue 3, the metadata refresh delay is typically low
(in the order of minutes with the classic consumer and tens of seconds with
the new consumer). If a user is ok with reading that much backlog for new
partitions, it seems they will be ok doing the same for existing partitions.

So, instead of introducing a new config, could we just reuse the existing
config with better documentation and/or implementation?

Jun


On Sat, Jun 13, 2026 at 12:19 AM 黃竣陽 <[email protected]> wrote:

> Hello Jun,
>
> You're right that group creation time is the more intuitive answer at
> first glance,
> the KIP's own motivation talks about partitions that "predate the group"
> vs partitions
> "created during group runtime," which directly points to a group-lifecycle
> classifier.
> I'd like to walk through why we landed on partition age, and the
> trade-offs we considered.
>
> We evaluated three candidate signals:
>
> 1. `by_duration:5secs`
>
> This covers the metadata blindness window, but has issues the KIP
> currently documents
> under "Why not use `by_duration`?":
>
> - Client-side `now() - duration` introduces clock skew across consumers.
> - `ListOffsets` retries shift the target forward, potentially missing
> records produced between
> attempts.
> - It applies uniformly to all partitions without committed offsets,
> including pre-existing partitions
> newly assigned to the group, causing unnecessary replay.
>
> 2. Group creation time as classifier
>
> This works cleanly when the consumer is actively running. Our concern
> is the idle / late-rejoin case:
>
> T=0:         Group created.
> T=1..T=100:  Consumer idle (down, disconnected, etc.).
> T=50:        Partition added during the idle window.
> T=100:       Consumer resumes.
>
> Under group creation time, the new partition is classified as new
> (`50 > 0`) and reset to `earliest`, replaying everything from T=50.
> But during `[T=1, T=100]`, base partitions also accumulated data that
> the consumer accepts as lost — that is precisely the contract of
> `auto.offset.reset=latest`. There is no principled reason to treat
> the new partition differently; both contain backlog accumulated during
> the same idle window.
>
> This aligns with the "backlog is backlog” principle you raised in
> the KIP-1282 thread: a `latest` user has tolerated some backlog on
> every other partition during the same idle period; forcing 0-backlog
> tolerance only on new partitions would be inconsistent with that
> tolerance.
>
> 3. Partition age vs threshold
>
> Partition age corresponds to the actual silent data loss window,
> the gap between partition creation and the consumer’s metadata
> refresh. Within this window, data loss is genuinely silent: the
> consumer had no opportunity to know about the partition. Outside this
> window, missing data reflects either:
>
> - (a) the user’s tolerated cost of running with idle consumers, or
> - (b) an operational issue to surface via monitoring, not via reset policy.
>
> We did not choose partition age because it is more elegant than group
> creation time — we chose it because its failure mode (requires a
> threshold) is
> less invasive than the failure mode of group creation time (overrides
> user-stated
> `latest` intent during idle periods).
>
> Best Regards,
> Jiunn-Yang
>
> > Chia-Ping Tsai <[email protected]> 於 2026年6月13日 上午11:52 寫道:
> >
> > Hi Jun,
> >
> > Relying on both creation times will create an inconsistent scenario. A
> > consumer that lost all offsets due to a long sleep will seek to the
> > beginning for the partitions created later than the group.
> >
> > That is why we initially proposed KIP-1282 to fix the inconsistency
> using a
> > whole new policy. Since KIP-1282 couldn't reach a consensus, KIP-1327
> goes
> > back to using flexible configurations to prevent users from falling into
> > that pitfall.
> >
> > Best, Chia-Ping
> >
> > Jun Rao via dev <[email protected]> 於 2026年6月13日週六 上午6:49寫道:
> >
> >> Hi, Jiunn-Yang,
> >>
> >> Thanks for the reply and sorry for the late reply.
> >>
> >> JR1. The design of auto.offset.reset.max.age.ms still feels weird to
> me.
> >> It
> >> categorizes partitions as new or existing based on the partition
> creation
> >> time. Intuitively, the categorization should be based on the group
> creation
> >> time: all partitions existing when the group is created are existing and
> >> all partitions created after the group creation are new partitions.
> >>
> >> Jun
> >>
> >>
> >>
> >> On Tue, Jun 9, 2026 at 8:51 AM 黃竣陽 <[email protected]> wrote:
> >>
> >>> Hi all,
> >>>
> >>> Manually bumping this thread. If there is no further
> >>> discussion, I will close the vote.
> >>>
> >>> Best Regards,
> >>> Jiunn-Yang
> >>>
> >>>> 黃竣陽 <[email protected]> 於 2026年6月1日 晚上7:16 寫道:
> >>>>
> >>>> Hello Jian,
> >>>>
> >>>> Thanks for your feedback,
> >>>>
> >>>> Agreed, partition expansion is a common operational task, not an edge
> >>>> case. I've updated the Motivation section accordingly.
> >>>>
> >>>> Best Regards,
> >>>> Jiunn-Yang
> >>>>
> >>>>> jian fu <[email protected]> 於 2026年6月1日 下午5:49 寫道:
> >>>>>
> >>>>> Hi Jiunn-Yang:
> >>>>>
> >>>>> Thanks for the KIP. I think it would be useful to clarify that this
> >> is a
> >>>>> common scenario rather than an edge case, which further demonstrates
> >> the
> >>>>> need for this optimization. For example:
> >>>>> A partition expansion is a common operational task in Kafka: To
> >> balance
> >>>>> resource utilization and cost, topics are typically created with a
> >>> moderate
> >>>>> default partition count. However, as traffic grows over time, it is
> >>> often
> >>>>> necessary to increase the number of partitions to accommodate the
> >> higher
> >>>>> workload.
> >>>>>
> >>>>> Regards
> >>>>> Jian
> >>>>>
> >>>>> 黃竣陽 <[email protected]> 于2026年5月30日周六 22:31写道:
> >>>>>
> >>>>>> Hello chia,
> >>>>>>
> >>>>>> Thanks for the comments, I have updated the KIP!
> >>>>>>
> >>>>>> Best Regards,
> >>>>>> Jiunn-Yang
> >>>>>>
> >>>>>>> Chia-Ping Tsai <[email protected]> 於 2026年5月30日 晚上8:29 寫道:
> >>>>>>>
> >>>>>>> Hi Jiunn-Yang,
> >>>>>>>
> >>>>>>> Would you mind removing the terms "hot" and "cold" when describing
> >>>>>>> partitions in the KIP? I understand you are using them to describe
> >> the
> >>>>>>> "freshness" or the users' need for the records, but applying these
> >>> terms
> >>>>>> to
> >>>>>>> the partition itself feels a bit unnatural.
> >>>>>>>
> >>>>>>> After all, in this scenario, users don't really care whether a
> >>> partition
> >>>>>> is
> >>>>>>> newly expanded or not. Their only expectation is that they won't
> >>> silently
> >>>>>>> lose any live records produced to the topic during their active
> >>>>>> consumption.
> >>>>>>>
> >>>>>>> Best, Chia-Ping
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> 黃竣陽 <[email protected]> 於 2026年5月30日週六 下午12:30寫道:
> >>>>>>>
> >>>>>>>> Hello Jun,
> >>>>>>>>
> >>>>>>>> Thanks for the feedback, I have updated the KIP motivation
> section.
> >>>>>>>>
> >>>>>>>> Best Regards,
> >>>>>>>> Jiunn-Yang
> >>>>>>>>
> >>>>>>>>> Jun Rao via dev <[email protected]> 於 2026年5月30日 凌晨1:12 寫道:
> >>>>>>>>>
> >>>>>>>>> Hi, Jiunn-Yang,
> >>>>>>>>>
> >>>>>>>>> Thanks for the reply. I think we need a stronger motivation for
> >> the
> >>>>>> KIP.
> >>>>>>>>>
> >>>>>>>>> The KIP says "The core insight is that not all partitions without
> >> a
> >>>>>>>>> committed offset are the same. A newly expanded partition (hot)
> is
> >>>>>>>>> fundamentally different from a partition the consumer has never
> >> seen
> >>>>>>>>> because it predates the group (cold)." Why is the hot partition
> >>>>>>>>> fundamentally different from the cold?
> >>>>>>>>>
> >>>>>>>>> The KIP says "The existing by_duration policy is also
> insufficient
> >>>>>>>> because:
> >>>>>>>>>
> >>>>>>>>> - The calculated seek time (now() - duration) varies across nodes
> >>> due
> >>>>>>>> to
> >>>>>>>>> clock skew. To be safe, users must set an overly large duration,
> >>>>>>>> causing
> >>>>>>>>> unnecessary reprocessing.
> >>>>>>>>> - On network errors, the client recalculates the seek time on
> >> retry,
> >>>>>>>>> shifting the target timestamp forward and risking data loss."
> >>>>>>>>>
> >>>>>>>>> However, both of these situations are rare. If these issues
> >> persist,
> >>>>>> more
> >>>>>>>>> severe problems likely exist elsewhere. Rare situations don't
> >> need a
> >>>>>>>> common
> >>>>>>>>> solution. If users care about those rare situations, they can
> >>> implement
> >>>>>>>>> customized logic using
> >>>>>> ConsumerRebalanceListener.onPartitionsAssigned().
> >>>>>>>>>
> >>>>>>>>> Jun
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Sun, May 17, 2026 at 6:50 AM 黃竣陽 <[email protected]> wrote:
> >>>>>>>>>
> >>>>>>>>>> Hello chia,
> >>>>>>>>>>
> >>>>>>>>>> Thanks for the feedback,
> >>>>>>>>>>
> >>>>>>>>>>> If the creation time exists, the returned value should always
> be
> >>>>>>>> greater
> >>>>>>>>>> than or equal to zero, right?
> >>>>>>>>>> I have explicitly mentioned this in the KIP.
> >>>>>>>>>>
> >>>>>>>>>>>> New  Old (MetadataResponse v0–13)    positive        any
> >>> field
> >>>>>>>>>> absent    UnsupportedVersionException
> >>>>>>>>>>
> >>>>>>>>>> The earliest point at which we can detect the version mismatch
> is
> >>>>>> during
> >>>>>>>>>> the
> >>>>>>>>>> first metadata fetch after assignment, which occurs inside
> >> poll().
> >>>>>>>>>> Therefore, the
> >>>>>>>>>> user would encounter an UnsupportedVersionException from poll().
> >>> I’ll
> >>>>>>>>>> clarify this in the KIP.
> >>>>>>>>>>
> >>>>>>>>>> Best Regards,
> >>>>>>>>>> Jiunn-Yang
> >>>>>>>>>>
> >>>>>>>>>>> Chia-Ping Tsai <[email protected]> 於 2026年5月17日 下午4:50 寫道:
> >>>>>>>>>>>
> >>>>>>>>>>> hi Jiunn
> >>>>>>>>>>>
> >>>>>>>>>>>> PartitionAgeMs (int64, default -1): The age of this partition
> >> in
> >>>>>>>>>> milliseconds, computed server-side by the broker as
> >>>>>> broker_current_time
> >>>>>>>> -
> >>>>>>>>>> partition_creation_time. Returns -1 if the broker does not
> >> support
> >>>>>> this
> >>>>>>>>>> feature or the partition creation time is unknown.
> >>>>>>>>>>>
> >>>>>>>>>>> If the creation time exists, the returned value should always
> be
> >>>>>>>> greater
> >>>>>>>>>> than or equal to zero, right?
> >>>>>>>>>>>
> >>>>>>>>>>>> New  Old (MetadataResponse v0–13)    positive        any
> >>> field
> >>>>>>>>>> absent    UnsupportedVersionException
> >>>>>>>>>>>
> >>>>>>>>>>> Will user encounter UnsupportedVersionException when calling
> >>>>>> `poll()`?
> >>>>>>>>>>>
> >>>>>>>>>>> Best,
> >>>>>>>>>>> Chia-Ping
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On 2026/05/16 04:30:49 黃竣陽 wrote:
> >>>>>>>>>>>> Hello Jun, chia,
> >>>>>>>>>>>>
> >>>>>>>>>>>> I've updated KIP-1327 with a design change based on the
> >>> discussion
> >>>>>>>>>>>> feedback.
> >>>>>>>>>>>>
> >>>>>>>>>>>> The updated design decouples the new-partition reset behavior
> >>> from
> >>>>>>>>>>>> the base auto.offset.reset policy:
> >>>>>>>>>>>>
> >>>>>>>>>>>> - auto.offset.reset.max.age.ms now applies to all
> >>> auto.offset.reset
> >>>>>>>>>> values
> >>>>>>>>>>>> (latest, earliest, by_duration, none).
> >>>>>>>>>>>> - For new ("hot") partitions, the consumer resets to
> >>>>>>>>>> auto.offset.reset.new.partitions
> >>>>>>>>>>>> config setting
> >>>>>>>>>>>> - For existing ("cold") partitions, the base auto.offset.reset
> >>>>>> policy
> >>>>>>>>>> continues
> >>>>>>>>>>>> to apply unchanged.
> >>>>>>>>>>>> - The new-partition reset behavior is represented by a
> separate
> >>>>>>>>>> internal config
> >>>>>>>>>>>> (auto.offset.reset.new.partitions, currently fixed to
> >> earliest).
> >>>>>> This
> >>>>>>>>>> decoupled design makes
> >>>>>>>>>>>> it straightforward to promote the behavior to a public
> >>> user-facing
> >>>>>>>>>> configuration in a future KIP.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Best Regards,
> >>>>>>>>>>>> Jiunn-Yang
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Chia-Ping Tsai <[email protected]> 於 2026年5月16日 清晨7:46 寫道:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> hi Jun
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I see what you mean now. The proposal from me is listed
> below:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 1) Add auto.offset.reset.new.partitions with a default value
> >> of
> >>>>>>>>>> earliest. It fixes the data loss from both by_duration and
> >> latest,
> >>> and
> >>>>>>>> it
> >>>>>>>>>> does not change the logic of auto.offset.reset=earliest.
> >>>>>>>>>>>>> 2) Mark auto.offset.reset.new.partitions as an internal
> >>>>>>>>>> configuration. auto.offset.reset.new.partitions=earliest
> already
> >>>>>>>>>> addresses the issue, and we can discuss the use cases of other
> >>> values
> >>>>>>>> in a
> >>>>>>>>>> separate KIP.
> >>>>>>>>>>>>> 3) Both configs, auto.offset.reset.new.partitions and
> >>>>>>>>>> auto.offset.reset.latest.max.age.ms, will be applied to all for
> >>>>>>>>>> consistency.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> WDYT?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On 2026/05/15 20:53:20 Jun Rao via dev wrote:
> >>>>>>>>>>>>>> Hi, Chia-Ping,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Thanks for the reply.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> 1. In the motivation section, the KIP says "When a Kafka
> >> topic
> >>> is
> >>>>>>>>>> expanded
> >>>>>>>>>>>>>> with new partitions, consumers using the latest auto offset
> >>> reset
> >>>>>>>>>> policy
> >>>>>>>>>>>>>> will silently miss all records produced to those partitions
> >>> before
> >>>>>>>> the
> >>>>>>>>>>>>>> consumer discovers them.". If a user sets
> >>>>>>>>>>>>>> auto.offset.reset=by_duration=1sec, the same record loss
> >> issue
> >>>>>> could
> >>>>>>>>>> also
> >>>>>>>>>>>>>> happen, right?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> 2. I was thinking auto.offset.reset.new.partitions will
> take
> >>> the
> >>>>>>>> same
> >>>>>>>>>>>>>> values as auto.offset.reset. So a user could set it
> >>> by_duration if
> >>>>>>>>>> needed.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Jun
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Thu, May 14, 2026 at 4:06 PM Chia-Ping Tsai <
> >>>>>> [email protected]
> >>>>>>>>>
> >>>>>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> hi Jun
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Thanks for the feedback. I might be missing something
> >>> important
> >>>>>>>> from
> >>>>>>>>>> your
> >>>>>>>>>>>>>>> suggestion, so please bear with me as I try to clarify with
> >> a
> >>> few
> >>>>>>>>>> questions:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> 1. Is there a strong use case for extending this logic to
> >>> other
> >>>>>>>> reset
> >>>>>>>>>>>>>>> policies? Unlike latest, policies like earliest or
> >> by_duration
> >>>>>>>> don't
> >>>>>>>>>> seem
> >>>>>>>>>>>>>>> to suffer from the same silent data loss issue when a
> >>> partition
> >>>>>> is
> >>>>>>>>>> expanded.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> 2. What values would we expect users to configure for
> >>>>>>>>>>>>>>> auto.offset.reset.new.partitions? If they set it to
> >> earliest
> >>> or
> >>>>>>>>>> latest,
> >>>>>>>>>>>>>>> we might run into the exact same edge cases. For example,
> >> if a
> >>>>>>>>>> consumer is
> >>>>>>>>>>>>>>> offline for a while and a new partition is created during
> >> that
> >>>>>>>>>> downtime,
> >>>>>>>>>>>>>>> the user might actually want to skip to latest when
> >> resuming,
> >>>>>>>> rather
> >>>>>>>>>> than
> >>>>>>>>>>>>>>> reading from earliest just because the partition is
> >>> technically
> >>>>>>>>>> "new" to
> >>>>>>>>>>>>>>> the group.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> This is exactly why we opted for introducing a max.age
> >>> threshold.
> >>>>>>>> It
> >>>>>>>>>> gives
> >>>>>>>>>>>>>>> users a time-bound way to define what is genuinely
> "hot/new"
> >>> and
> >>>>>>>>>> what is
> >>>>>>>>>>>>>>> just an old partition they haven't seen yet.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>> Chia-Ping
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On 2026/05/14 20:48:09 Jun Rao via dev wrote:
> >>>>>>>>>>>>>>>> Hi, Jiunn-Yang,
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Thanks for the KIP.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> I find auto.offset.reset.latest.max.age a bit weird. It
> >> only
> >>>>>>>>>> applies when
> >>>>>>>>>>>>>>>> auto.offset.reset is latest. However, it seems that the
> >>>>>> motivation
> >>>>>>>>>>>>>>> equally
> >>>>>>>>>>>>>>>> applies when auto.offset.reset is set to other values like
> >>>>>>>>>> by_duration.
> >>>>>>>>>>>>>>> The
> >>>>>>>>>>>>>>>> intention is that we want to have a separate way to
> control
> >>>>>> newly
> >>>>>>>>>> created
> >>>>>>>>>>>>>>>> partitions vs existing partitions when the group starts.
> >>> Have we
> >>>>>>>>>>>>>>> considered
> >>>>>>>>>>>>>>>> adding a new config like auto.offset.reset.new
> .partitions?
> >>> If
> >>>>>>>> this
> >>>>>>>>>> new
> >>>>>>>>>>>>>>>> config is not set, the offset reset policy defaults to the
> >>>>>> policy
> >>>>>>>>>> used
> >>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>> existing partitions. The user could set it explicitly to
> >>>>>> customize
> >>>>>>>>>> the
> >>>>>>>>>>>>>>>> behavior for new partitions.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Jun
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Thu, May 7, 2026 at 5:07 AM 黃竣陽 <[email protected]>
> >>> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Hi all,
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> I’d like to manually bump this thread.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Best Regards,
> >>>>>>>>>>>>>>>>> Jiunn-Yang
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> 黃竣陽 <[email protected]> 於 2026年5月1日 晚上10:37 寫道:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Hello all,
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Thanks for the feedback.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> DJ01/DJ02:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> MetadataResponse bumps from v13 to v14. The
> >>> PartitionMetadata
> >>>>>>>>>> struct
> >>>>>>>>>>>>>>>>> gains a new
> >>>>>>>>>>>>>>>>>> field PartitionAgeMs (int64, default -1), computed
> >>> server-side
> >>>>>>>> by
> >>>>>>>>>> the
> >>>>>>>>>>>>>>>>> broker as
> >>>>>>>>>>>>>>>>>> broker_current_time - partition_creation_time.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Also add the consumer heartbeat flow. when
> >>> MembershipManager
> >>>>>>>>>> detects
> >>>>>>>>>>>>>>> a
> >>>>>>>>>>>>>>>>> newly assigned
> >>>>>>>>>>>>>>>>>> partition, it explicitly invalidates the metadata for
> the
> >>>>>>>> affected
> >>>>>>>>>>>>>>> topic
> >>>>>>>>>>>>>>>>> and forces a fresh MetadataRequest
> >>>>>>>>>>>>>>>>>> before making the offset reset decision, even if the
> >> topic
> >>> ID
> >>>>>> is
> >>>>>>>>>>>>>>> already
> >>>>>>>>>>>>>>>>> in the cache.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> MB0:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> The consumer learns the broker's maximum supported
> >>>>>>>>>> MetadataResponse
> >>>>>>>>>>>>>>>>> version via the
> >>>>>>>>>>>>>>>>>> ApiVersions negotiation at connection time. If the
> >>> negotiated
> >>>>>>>>>>>>>>> version is
> >>>>>>>>>>>>>>>>> unsupported, the consumer
> >>>>>>>>>>>>>>>>>> knows the broker does not support PartitionAgeMs at all
> >> and
> >>>>>> can
> >>>>>>>>>>>>>>> throw an
> >>>>>>>>>>>>>>>>> UnsupportedVersionException
> >>>>>>>>>>>>>>>>>> immediately, rather than silently falling back to latest
> >>> and
> >>>>>>>>>> risking
> >>>>>>>>>>>>>>>>> data loss without any operator-visible signal.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> MB1/MB2/MB3:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> I have addressed these changes in the KIP.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Best Regards,
> >>>>>>>>>>>>>>>>>> Jiunn-Yang
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Chia-Ping Tsai <[email protected]> 於 2026年4月29日
> 下午4:04
> >>> 寫道:
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> hi David
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> I agree with the direction of moving the 'age'
> >> resolution
> >>>>>> from
> >>>>>>>>>> the
> >>>>>>>>>>>>>>>>> Heartbeat API to the Metadata API to keep the control
> >> plane
> >>>>>>>> clean.
> >>>>>>>>>> The
> >>>>>>>>>>>>>>> main
> >>>>>>>>>>>>>>>>> trade-off, as we noted before, is introducing
> inter-broker
> >>>>>> clock
> >>>>>>>>>> skew.
> >>>>>>>>>>>>>>> The
> >>>>>>>>>>>>>>>>> Group Coordinator approach provided a single source of
> >> truth
> >>>>>> for
> >>>>>>>>>> time.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> However, realistically, this time skew should be
> >>> negligible.
> >>>>>>>>>> Given
> >>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>> the max.age threshold will likely be configured in
> minutes
> >>> or
> >>>>>>>>>> hours, a
> >>>>>>>>>>>>>>>>> typical NTP skew (in milliseconds) between brokers won't
> >>> impact
> >>>>>>>> the
> >>>>>>>>>>>>>>>>> fallback decision.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>>>>> Chia-Ping
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> David Jacot via dev <[email protected]> 於
> >> 2026年4月29日
> >>>>>>>> 下午3:29
> >>>>>>>>>> 寫道:
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Hi all,
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Thanks for the KIP!
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Sorry, I haven't really followed the previous
> >>> conversation
> >>>>>>>> but I
> >>>>>>>>>>>>>>> took a
> >>>>>>>>>>>>>>>>>>>> quick look at this one.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> DJ01: I don't clearly understand the flow with the
> >>>>>>>>>>>>>>>>> ConsumerGroupHeartbeat
> >>>>>>>>>>>>>>>>>>>> API after reading the KIP. There is a new boolean; the
> >>> KIP
> >>>>>>>>>> states
> >>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>>> partition ages are returned only when this boolean is
> >>> set.
> >>>>>>>>>>>>>>> Implicitly,
> >>>>>>>>>>>>>>>>> this
> >>>>>>>>>>>>>>>>>>>> means that when the consumer receives a new partition,
> >> it
> >>>>>> will
> >>>>>>>>>>>>>>> issue a
> >>>>>>>>>>>>>>>>> new
> >>>>>>>>>>>>>>>>>>>> HB request with the boolean set to receive the ages.
> Is
> >>> my
> >>>>>>>>>>>>>>>>> understanding
> >>>>>>>>>>>>>>>>>>>> correct? We should perhaps clarify the flow and also
> >>> explain
> >>>>>>>>>> how it
> >>>>>>>>>>>>>>>>> fits
> >>>>>>>>>>>>>>>>>>>> into the existing flow (e.g. list offsets, fetch
> >> offsets,
> >>>>>>>> etc.).
> >>>>>>>>>>>>>>>>>>>> DJ02: It my understanding is correct, I wonder if
> >>>>>>>>>>>>>>>>>>>> the ConsumerGroupHeartbeat API is the right place for
> >>> this
> >>>>>>>> given
> >>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>> a new
> >>>>>>>>>>>>>>>>>>>> round trip is done anyway. Alternatively, it could
> >> simply
> >>>>>>>>>> include
> >>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>> metadata. Generally, we should be rather cautious
> about
> >>> not
> >>>>>>>>>>>>>>> overloading
> >>>>>>>>>>>>>>>>>>>> the ConsumerGroupHeartbeat API with unrelated
> concepts.
> >>> The
> >>>>>>>> API
> >>>>>>>>>> is
> >>>>>>>>>>>>>>> a
> >>>>>>>>>>>>>>>>>>>> control plane API for assigning or revoking
> partitions.
> >>> The
> >>>>>>>> fact
> >>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>> we
> >>>>>>>>>>>>>>>>>>>> don't want to add it to the corresponding Streams API
> >>> also
> >>>>>>>>>> suggests
> >>>>>>>>>>>>>>>>>>>> something is not quite right. What would we do if we
> >>> want to
> >>>>>>>>>>>>>>> support
> >>>>>>>>>>>>>>>>>>>> Streams in the future?
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>>>>>> David
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> On Wed, Apr 29, 2026 at 12:28 AM Muralidhar Basani
> via
> >>> dev
> >>>>>> <
> >>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Hi Jiunn,
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Thank you for this great kip. Good to know about the
> >>> gap.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> mb-0 - why a new v2 version bump for
> >>> RequestPartitionAges
> >>>>>>>>>> field.
> >>>>>>>>>>>>>>> Can a
> >>>>>>>>>>>>>>>>>>>>> tagged field (for ex: on response, PartitionAges on
> >>>>>>>>>>>>>>> TopicPartitions)
> >>>>>>>>>>>>>>>>> be
> >>>>>>>>>>>>>>>>>>>>> used here and avoid version bump?
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> mb-1 - For the new config, is there a recommended
> >> value
> >>> or
> >>>>>> a
> >>>>>>>>>>>>>>> ConfigDef
> >>>>>>>>>>>>>>>>>>>>> validator? Probably it should based on the
> >>>>>>>> metadata.max.age.ms
> >>>>>>>>>> ?
> >>>>>>>>>>>>>>>>> Sizing
> >>>>>>>>>>>>>>>>>>>>> instructions can be part of javadocs I guess.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> mb-2 - (minor) As there are no changes to Kafka
> >> Streams,
> >>>>>>>> would
> >>>>>>>>>> it
> >>>>>>>>>>>>>>> be
> >>>>>>>>>>>>>>>>> better
> >>>>>>>>>>>>>>>>>>>>> to add this new config
> >> auto.offset.reset.latest.max.age
> >>> to
> >>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>> StreamsConfig block list
> >>>>>>>>>>>>>>> (NON_CONFIGURABLE_CONSUMER_DEFAULT_CONFIGS)
> >>>>>>>>>>>>>>>>> for a
> >>>>>>>>>>>>>>>>>>>>> clear warning, incase users configure it? This is the
> >>> most
> >>>>>>>>>>>>>>> familiar
> >>>>>>>>>>>>>>>>>>>>> consumer config and users might easily mistakenly
> >>> configure
> >>>>>>>>>> it. Or
> >>>>>>>>>>>>>>>>> may be
> >>>>>>>>>>>>>>>>>>>>> it's not worth it to add.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> mb-3 - (minor) The phrasing "the consumer falls back
> >> to
> >>>>>>>>>> earliest"
> >>>>>>>>>>>>>>>>> reads as
> >>>>>>>>>>>>>>>>>>>>> if the config were being changed per-partition which
> >>> isn't
> >>>>>>>>>>>>>>> supported.
> >>>>>>>>>>>>>>>>> May
> >>>>>>>>>>>>>>>>>>>>> be rephrasing to something like "consumer resolves
> the
> >>>>>>>> initial
> >>>>>>>>>>>>>>>>> position to
> >>>>>>>>>>>>>>>>>>>>> start offset for that partition" as if earliest was
> >>> applied
> >>>>>>>> to
> >>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>>>> partition only and auto.offset.reset config is
> >>> unchanged.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>>>>>>>>> Murali
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> On Tue, Apr 28, 2026 at 2:48 PM 黃竣陽 <
> >>> [email protected]>
> >>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Hi chia,
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> I have updated the KIP to include this change.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Best Regards,
> >>>>>>>>>>>>>>>>>>>>>> Jiunn-Yang
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Chia-Ping Tsai <[email protected]> 於 2026年4月28日
> >>> 晚上8:03
> >>>>>>>> 寫道:
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> hi Jiunn-Yang
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> chia_0: Should we expose the partition creation
> time
> >>> via
> >>>>>>>> the
> >>>>>>>>>>>>>>> Admin
> >>>>>>>>>>>>>>>>> API?
> >>>>>>>>>>>>>>>>>>>>>> I assume it would be valuable for users to diagnose
> >> and
> >>>>>>>>>>>>>>> troubleshoot
> >>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>> behavior of auto.offset.reset.latest.max.age
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>>>>>>>>> Chia-Ping
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> On 2026/04/28 10:47:58 黃竣陽 wrote:
> >>>>>>>>>>>>>>>>>>>>>>>> Hello everyone,
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> I would like to start a discussion on KIP-1327
> >>> Prevent
> >>>>>> Hot
> >>>>>>>>>> Data
> >>>>>>>>>>>>>>>>> Loss
> >>>>>>>>>>>>>>>>>>>>> on
> >>>>>>>>>>>>>>>>>>>>>> Partition Expansion for Latest Policy
> >>>>>>>>>>>>>>>>>>>>>>>> <
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>
> >>
> https://urldefense.com/v3/__https://cwiki.apache.org/confluence/x/KY4mGQ__;!!Ayb5sqE7!qF4q1QzF1RRgP61D7A2xuEai1ky7fepKDKFFvpNBuePikH-ULmT87TvuuZzy5kau5E4y5zMZAmfQQiwZomM$
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> This proposal aims to introduces
> >>>>>>>>>>>>>>> auto.offset.reset.latest.max.age,
> >>>>>>>>>>>>>>>>> a
> >>>>>>>>>>>>>>>>>>>>>> consumer config that lets the
> >>>>>>>>>>>>>>>>>>>>>>>> latest reset policy distinguish newly expanded
> >> (hot)
> >>>>>>>>>> partitions
> >>>>>>>>>>>>>>>>> from
> >>>>>>>>>>>>>>>>>>>>>> long-existing (cold) ones. Partitions
> >>>>>>>>>>>>>>>>>>>>>>>> younger than the configured threshold
> automatically
> >>> fall
> >>>>>>>>>> back
> >>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>> earliest, preventing silent data loss
> >>>>>>>>>>>>>>>>>>>>>>>> during topic expansion without forcing a full
> >>> historical
> >>>>>>>>>>>>>>> reprocess.
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> Best regards,
> >>>>>>>>>>>>>>>>>>>>>>>> Jiunn-Yang
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>>>
> >>>>
> >>>
> >>>
> >>
>
>

Reply via email to