Hi Jason,

This looks like a very useful tool, thanks for writing it up.

Given that it's possible for replica producer states to diverge from each
other, it would be very useful if DescribeProducers(Request,Response) and
tooling is able to query all partition replicas for their producers. One
way I can see this being used immediately is in kafka's system tests,
especially the ones that inject failures. At the end of the test we can
query all replicas and make sure that their states have not diverged. I can
also see it being useful when debugging production clusters too.

Would it be worth returning transactional.id.expiration.ms in the
DescribeProducersResponse?

Cheers,

Lucas



On Wed, Aug 26, 2020 at 12:12 PM Ron Dagostino <rndg...@gmail.com> wrote:

> Yes, that definitely sounds reasonable.  Thanks, Jason!
>
> Ron
>
> On Wed, Aug 26, 2020 at 3:03 PM Jason Gustafson <ja...@confluent.io>
> wrote:
>
> > Hey Ron,
> >
> > We do not typically backport new APIs to older versions. I think we can
> > however make the --abort command compatible with older versions. It would
> > require a user to do some analysis on their own to identify a hanging
> > transaction, but then they can use the tool from a new release to
> recover.
> > For example, users could detect a hanging transaction through the
> existing
> > "LastStableOffsetLag" metric and then collect the needed information
> from a
> > dump of the log (or producer snapshot). It's more work, but at least it's
> > possible. Does that sound fair?
> >
> > Thanks,
> > Jason
> >
> > On Wed, Aug 26, 2020 at 11:51 AM Ron Dagostino <rndg...@gmail.com>
> wrote:
> >
> > > Hi Jason.  Thanks for the excellently-written KIP.
> > >
> > > Will the implementation be backported to prior Kafka versions?  The
> > reason
> > > I ask is because if it is not backported and similar functionality is
> not
> > > otherwise made available for older versions, then the only recourse
> > (aside
> > > from deleting and recreating the topic as you pointed out) may be to
> > > upgrade to 2.7 (or whatever version ends up getting this
> functionality).
> > > Such an upgrade may not be desirable, especially if the number of
> > > intermediate versions is considerable. I understand the mantra of
> "never
> > > fall too many versions behind" but the reality of it is that it isn't
> > > always the case.  Even if the version is relatively recent, an upgrade
> > may
> > > still not be possible for some time, and a quicker resolution may be
> > > necessary.
> > >
> > > Ron
> > >
> > > On Wed, Aug 26, 2020 at 2:33 PM Jason Gustafson <ja...@confluent.io>
> > > wrote:
> > >
> > > > Hi All,
> > > >
> > > > I've added a proposal to handle the problem of hanging transactions:
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-664%3A+Provide+tooling+to+detect+and+abort+hanging+transactions
> > > > .
> > > > In theory, this should never happen. In practice, we have hit one bug
> > > where
> > > > it was possible and there are few good options today to recover.
> Take a
> > > > look and let me know what you think.
> > > >
> > > > Thanks,
> > > > Jason
> > > >
> > >
> >
>

Reply via email to