Hi PoAn,
Thanks for your comment.

PY00: I agree. I've made the changes to the KIP.

Thanks,
Andrew

On 2026/01/15 10:16:18 PoAn Yang wrote:
> Hi Andrew,
> 
> Thanks for the KIP. I have a question about broker configuration.
> 
> PY00: Would you consider mentioning the update mode for 
> errors.deadletterqueue.topic.name.prefix
> and errors.deadletterqueue.auto.create.topics.enable are cluster-wide?
> Clarifying that these values must be consistent across the cluster (or 
> updated dynamically as a cluster default)
> would help preventing inconsistent values among brokers.
> 
> Thanks,
> PoAn
> 
> > On Jan 8, 2026, at 6:18 PM, Andrew Schofield <[email protected]> wrote:
> > 
> > Hi Shekhar,
> > Thanks for your comment.
> > 
> > If the leader of the DLQ topic-partition changes as we are trying to write 
> > to it,
> > then the code will need to cope with this.
> > 
> > If the leader of the share-partition changes, we do not need special 
> > processing.
> > If the transition to ARCHIVED is affected by a share-partition leadership 
> > change,
> > the new leader will be responsible for the state transition. For example, 
> > if a consumer
> > has rejected a record, a leadership change will cause the rejection to 
> > fail, and the
> > record will be delivered again. This new delivery attempt will be performed 
> > by the
> > new leader, and if this delivery attempt results in a rejection, the new 
> > leader will
> > be responsible for initiating the DLQ write.
> > 
> > Hope this makes sense,
> > Andrew
> > 
> > On 2026/01/03 15:02:31 Shekhar Prasad Rajak via dev wrote:
> >> Hi,
> >> If leader changes during DLQ write, or  a share partition leader changes, 
> >> the partition is marked FENCED and in-memory cache state is lost, I think 
> >> we need to add those cases as well. 
> >> Ref 
> >> https://github.com/apache/kafka/blob/trunk/core/src/main/java/kafka/server/share/SharePartitionManager.java#L857
> >> 
> >> 
> >> 
> >> Regards,Shekhar
> >> 
> >> 
> >> 
> >>    On Monday 29 December 2025 at 11:53:20 pm GMT+5:30, Andrew Schofield 
> >> <[email protected]> wrote:  
> >> 
> >> Hi Abhinav,
> >> Thanks for your comments.
> >> 
> >> AD01: Even if we were to allow the client to write to the DLQ topic,
> >> it would not be sufficient for situations in which the problem is one
> >> that the client cannot handle. So, my view is that it's preferable to
> >> use the same mechanism for all DLQ topic writes, regardless of
> >> whether the consumer initiated the process by rejecting a
> >> record or not.
> >> 
> >> AD02: I have added a metric for counting failed DLQ topic produce
> >> requests per group. The KIP does say that the broker logs an
> >> error when it fails to produce to the DLQ topic.
> >> 
> >> Thanks,
> >> Andrew
> >> 
> >> On 2025/12/16 10:38:39 Abhinav Dixit via dev wrote:
> >>> Hi Andrew,
> >>> Thanks for this KIP. I have a couple of questions -
> >>> 
> >>> AD01: From an implementation perspective, why can't we create/write 
> >>> records
> >>> to the DLQ topic from the client? Why do we want to do it from the broker?
> >>> As far as I understand, archiving the record on the share partition and
> >>> writing records to DLQ are independent? As you've mentioned in the KIP, 
> >>> "It
> >>> is possible in rare situations that more than one DLQ record could be
> >>> written for a particular undeliverable record", won't we minimize these
> >>> scenarios (by eliminating the dependency on persister write state result)
> >>> by writing records to the DLQ from the client?
> >>> 
> >>> AD02: I agree with AM01 that we should emit a metric which can report the
> >>> count of failures of writing records to DLQ topic which an application
> >>> developer can monitor. If we are logging an error, maybe we should log the
> >>> count of such failures periodically?
> >>> 
> >>> Regards,
> >>> Abhinav Dixit
> >>> 
> >>> On Fri, Dec 12, 2025 at 3:08 AM Apoorv Mittal <[email protected]>
> >>> wrote:
> >>> 
> >>>> Hi Andrew,
> >>>> Thanks for the much needed enhancement for SHare Groups. Some questions:
> >>>> 
> >>>> AM1: The KIP states that in case of some failure "the broker will log an
> >>>> error", how an application developer will utilize this information and 
> >>>> know
> >>>> about any such occurrences? Should we emit a metric which can report the
> >>>> count of such failures which an application developer can monitor?
> >>>> 
> >>>> AM2: Today records can go to Archived state either when exceeded the
> >>>> delivery limit or explicitly rejected by the client. I am expecting the
> >>>> records will be written to dlq topic only in the former case i.e. when
> >>>> exceeded the delivery limit, that's what KIP explains. If yes, then can't
> >>>> there be a failure handling in the client which on serialization or other
> >>>> issues want to reject the message explicitly to be placed on dlq? Should 
> >>>> we
> >>>> have a config which governs this behaviour i.e. if enabled then any
> >>>> explicitly rejected record from client will also go to dlq?
> >>>> 
> >>>> AM3: I read your response on the thread related to the tricky part of ACL
> >>>> for DLQ topics and I have a question in the similar area. The KIP 
> >>>> defines a
> >>>> config "errors.deadletterqueue.auto.create.topics.enable" which if 
> >>>> enabled
> >>>> then broker can create the topic automatically using given other dlq 
> >>>> topic
> >>>> params. If a new dlq topic is created then what basic permissions should 
> >>>> be
> >>>> applied so the application developer can access? Should we provide
> >>>> capability to create dlq topics automatically or should restrict that and
> >>>> enforce it to be created by the application owner? By latter we know the
> >>>> application owner has access to the dlq topic already.
> >>>> 
> >>>> AM4: For the "errors.deadletterqueue.topic.name.prefix", I am expecting
> >>>> that this applies well for auto created dlq topics. But how do we enforce
> >>>> the prefix behaviour when the application developer provides the dlq 
> >>>> topic
> >>>> name in group configuration? Will there be a check while setting the 
> >>>> group
> >>>> configuration "errors.deadletterqueue.topic.name" as per broker expected
> >>>> prefix?
> >>>> 
> >>>> Regards,
> >>>> Apoorv Mittal
> >>>> 
> >>>> 
> >>>> On Wed, Dec 10, 2025 at 5:59 PM Federico Valeri <[email protected]>
> >>>> wrote:
> >>>> 
> >>>>> Hi Andrew, a few comments/questions from me:
> >>>>> 
> >>>>> FV00: The KIP says "copying of the original record data into the DLQ
> >>>>> is controlled by two configurations", but I only see the client side
> >>>>> configuration in the latest revision.
> >>>>> 
> >>>>> FV01: The KIP says: "When an undeliverable record transitions to the
> >>>>> Archived state for such a group, a record is written onto the DLQ
> >>>>> topic". Later on it mentions a new "Archiving" state. Can you clarify
> >>>>> the state transition when sending a record to a DLQ?
> >>>>> 
> >>>>> FV02: Is the new state required to ensure that the DLQ record is
> >>>>> eventually written in case of the Share Coordinator failover?
> >>>>> 
> >>>>> Thanks,
> >>>>> Fede
> >>>>> 
> >>>>> 
> >>>>> On Tue, Dec 2, 2025 at 7:19 PM Andrew Schofield <[email protected]>
> >>>>> wrote:
> >>>>>> 
> >>>>>> Hi,
> >>>>>> I'd like to bump this discussion thread for adding DLQs to share
> >>>> groups.
> >>>>>> 
> >>>>>> 
> >>>>> 
> >>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1191%3A+Dead-letter+queues+for+share+groups
> >>>>>> 
> >>>>>> Thanks,
> >>>>>> Andrew
> >>>>>> 
> >>>>>> On 2025/10/16 19:02:48 Andrew Schofield wrote:
> >>>>>>> Hi Chia-Ping,
> >>>>>>> Apologies for not responding to your comments. I was having email
> >>>>> problems
> >>>>>>> and I’ve only just noticed the unanswered comments. Also, this is
> >>>> not a
> >>>>>>> direct reply.
> >>>>>>> 
> >>>>>>>>> chia00: How can we specify the number of partitions and the
> >>>>> replication factor
> >>>>>>>   when `errors.deadletterqueue.auto.create.topics.enable` is set to
> >>>>> true?
> >>>>>>> 
> >>>>>>> Personally, I prefer to make people create their DLQ topics manually,
> >>>>> but I take the
> >>>>>>> point. In order to give full flexibility, the list of configs you
> >>>> need
> >>>>> is quite long including
> >>>>>>> min.isr and compression. For consistency with Kafka Connect sink
> >>>>> connectors, I
> >>>>>>> could add `errors.deadletterqueue.topic.replication.factor` but
> >>>> that's
> >>>>> the only
> >>>>>>> additional config provided by Kafka Connect. Is that worthwhile? I
> >>>>> suggest not.
> >>>>>>> 
> >>>>>>> The DLQ topic config in this KIP is broker-level config, while it's
> >>>>> connector-level
> >>>>>>> config for Kafka Connect. So, my preference is to just have one
> >>>>> broker-level config
> >>>>>>> for auto-creation on/off, and auto-create with the cluster's topic
> >>>>> defaults. If anything
> >>>>>>> more specific is required, the administrator can create the DLQ topic
> >>>>> themselves with
> >>>>>>> their preferences. Let me know what you think.
> >>>>>>> 
> >>>>>>>>> chia01: Should the error stack trace be included in the message
> >>>>> headers,
> >>>>>>>   similar to what's done in KIP-298?
> >>>>>>> 
> >>>>>>> In KIP-298, the code deciding to write a message to the DLQ is
> >>>> running
> >>>>> in the
> >>>>>>> Kafka Connect task and an exception is readily available. In this
> >>>> KIP,
> >>>>> the code writing
> >>>>>>> to the DLQ is running in the broker and it doesn't have any detail
> >>>>> about why the
> >>>>>>> record is being DLQed. I think that actually the
> >>>>> __dlq.errors.exception.*  headers
> >>>>>>> are not feasible without allowing the application to provide
> >>>>> additional error context.
> >>>>>>> That might be helpful one day, but that's extending this KIP more
> >>>> than
> >>>>> I intend.
> >>>>>>> I have removed these headers from the KIP.
> >>>>>>> 
> >>>>>>>>> chia02: Why does `errors.deadletterqueue.copy.record.enable` have
> >>>>> different
> >>>>>>> default values at the broker level and group level?
> >>>>>>> 
> >>>>>>> I want the group administrator to be able to choose whether to copy
> >>>>> the payloads.
> >>>>>>> I was also thinking that it would be a good idea if the cluster
> >>>>> administrator could
> >>>>>>> prevent this across the cluster, but I've changed my mind and I've
> >>>>> removed it.
> >>>>>>> 
> >>>>>>> Maybe a better idea would simply to have a broker config
> >>>>>>> `group.share.errors.deadletterqueue.enable` to turn DLQ on/off. The
> >>>>> other
> >>>>>>> broker configs in this KIP do not start `group.share.` because
> >>>> they're
> >>>>> intended
> >>>>>>> for other DLQ uses by the broker in future.
> >>>>>>> 
> >>>>>>> Note that although share.version=2 is required to enable DLQ, this
> >>>>> isn't a suitable
> >>>>>>> long-term switch because we might have share.version > 2 due to
> >>>>> another future
> >>>>>>> enhancement.
> >>>>>>> 
> >>>>>>>>> chia03: Does the broker log an error for every message if the DLQ
> >>>>> topic fails to be created?
> >>>>>>> 
> >>>>>>> No, that seems excessive and likely to flood the logs. I would
> >>>>> implement something like
> >>>>>>> no more than one log per minute, per share-partition. That would be
> >>>>> annoying enough to
> >>>>>>> fix without being catastrophically verbose.
> >>>>>>> 
> >>>>>>> Of course, if the group config `errors.deadletterqueue.topic.name`
> >>>>> has a value which
> >>>>>>> does not satisfy the broker config
> >>>>> `errors.deadletterqueue.topic.name.prefix`, it will
> >>>>>>> be considered a config error and the DLQ will not be used.
> >>>>>>> 
> >>>>>>>>> chia04: Have you consider adding metrics for the DLQ?
> >>>>>>> 
> >>>>>>> Yes, that is a good idea. I've added some metrics to the KIP. Please
> >>>>> take a look.
> >>>>>>> 
> >>>>>>> 
> >>>>>>> Thanks,
> >>>>>>> Andrew
> >>>>>>> 
> >>>>>>>> On 4 Aug 2025, at 11:30, Andrew Schofield <
> >>>>> [email protected]> wrote:
> >>>>>>>> 
> >>>>>>>> Hi,
> >>>>>>>> Thanks for your comments on the KIP and sorry for the delay in
> >>>>> responding.
> >>>>>>>> 
> >>>>>>>> D01: Authorisation is the area of this KIP that I think is most
> >>>>> tricky. The reason that I didn't implement specific
> >>>>>>>> ACLs for DLQs because I was not convinced they would help. So, if
> >>>>> you have a specific idea in mind, please
> >>>>>>>> let me know. This is the area that I'm least comfortable with in
> >>>> the
> >>>>> KIP.
> >>>>>>>> 
> >>>>>>>> I suppose maybe to set the DLQ name for a group, you could need a
> >>>>> higher level of authorisation
> >>>>>>>> than just ALTER_CONFIGS on the GROUP. But what I settled with in
> >>>> the
> >>>>> KIP was that DLQ topics
> >>>>>>>> all start with the same prefix, defaulting to "dlq.", and that the
> >>>>> topics do not automatically create.
> >>>>>>>> 
> >>>>>>>> D02: I can see that. I've added a config which I've called
> >>>>> errors.deadletterqueue.auto.create.topics.enable
> >>>>>>>> just to have a consistent prefix on all of the config names. Let me
> >>>>> know what you think.
> >>>>>>>> 
> >>>>>>>> D03: I've added some text about failure scenarios when attempting
> >>>> to
> >>>>> write records to the DLQ.
> >>>>>>>> 
> >>>>>>>> Thanks,
> >>>>>>>> Andrew
> >>>>>>>> ________________________________________
> >>>>>>>> From: isding_l <[email protected]>
> >>>>>>>> Sent: 16 July 2025 04:18
> >>>>>>>> To: dev <[email protected]>
> >>>>>>>> Subject: Re: [DISCUSS]: KIP-1191: Dead-letter queues for share
> >>>> groups
> >>>>>>>> 
> >>>>>>>> Hi Andrew,
> >>>>>>>> Thanks for the nice KIP, This KIP design for introducing
> >>>> dead-letter
> >>>>> queues (DLQs) for Share Groups is generally clear and reasonable,
> >>>>> addressing the key pain points of handling "poison message".
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> D01: Should we consider implementing independent ACL configurations
> >>>>> for DLQs? This would enable separate management of DLQ topic read/write
> >>>>> permissions from source topics, preventing privilege escalation attacks
> >>>> via
> >>>>> "poison message" + DLQ mechanisms.
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> D02: While disabling automatic DLQ topic creation is justifiable
> >>>> for
> >>>>> security, it creates operational overhead in automated deployments. Can
> >>>> we
> >>>>> introduce a configuration parameter auto.create.dlq.topics.enable to
> >>>> govern
> >>>>> this behavior?
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> D03: How should we handle failure scenarios when brokers attempt to
> >>>>> write records to the DLQ?
> >>>>>>>> ---- Replied Message ----
> >>>>>>>> | From | Andrew Schofield<[email protected]> |
> >>>>>>>> | Date | 07/08/2025 17:54 |
> >>>>>>>> | To | [email protected]<[email protected]> |
> >>>>>>>> | Subject | [DISCUSS]: KIP-1191: Dead-letter queues for share
> >>>> groups
> >>>>> |
> >>>>>>>> Hi,
> >>>>>>>> I'd like to start discussion on KIP-1191 which adds dead-letter
> >>>>> queue support for share groups.
> >>>>>>>> Records which cannot be processed by consumers in a share group can
> >>>>> be automatically copied
> >>>>>>>> onto another topic for a closer look.
> >>>>>>>> 
> >>>>>>>> KIP:
> >>>>> 
> >>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1191%3A+Dead-letter+queues+for+share+groups
> >>>>>>>> 
> >>>>>>>> Thanks,
> >>>>>>>> Andrew
> >>>>>>> 
> >>>>>>> 
> >>>>> 
> >>>> 
> >>> 
> >> 
> 
> 

Reply via email to