Hey Lucas,

Thanks for the comments. Responses below:

> Given that it's possible for replica producer states to diverge from each
other, it would be very useful if DescribeProducers(Request,Response) and
tooling is able to query all partition replicas for their producers

Yes, it makes sense to me to let DescribeProducers work on both followers
and leaders. In fact, I'm encouraged that there are use cases for this work
other than detecting hanging transactions. That was indeed the hope, but I
didn't have anything specific in mind. I will update the proposal.

> Would it be worth returning transactional.id.expiration.ms in the
DescribeProducersResponse?

That's an interesting thought as well. Are you trying to avoid the need to
specify it through the command line? The tool could also query the value
with DescribeConfigs I suppose.

Thanks,
Jason

On Thu, Aug 27, 2020 at 10:48 AM Lucas Bradstreet <lu...@confluent.io>
wrote:

> Hi Jason,
>
> This looks like a very useful tool, thanks for writing it up.
>
> Given that it's possible for replica producer states to diverge from each
> other, it would be very useful if DescribeProducers(Request,Response) and
> tooling is able to query all partition replicas for their producers. One
> way I can see this being used immediately is in kafka's system tests,
> especially the ones that inject failures. At the end of the test we can
> query all replicas and make sure that their states have not diverged. I can
> also see it being useful when debugging production clusters too.
>
> Would it be worth returning transactional.id.expiration.ms in the
> DescribeProducersResponse?
>
> Cheers,
>
> Lucas
>
>
>
> On Wed, Aug 26, 2020 at 12:12 PM Ron Dagostino <rndg...@gmail.com> wrote:
>
> > Yes, that definitely sounds reasonable.  Thanks, Jason!
> >
> > Ron
> >
> > On Wed, Aug 26, 2020 at 3:03 PM Jason Gustafson <ja...@confluent.io>
> > wrote:
> >
> > > Hey Ron,
> > >
> > > We do not typically backport new APIs to older versions. I think we can
> > > however make the --abort command compatible with older versions. It
> would
> > > require a user to do some analysis on their own to identify a hanging
> > > transaction, but then they can use the tool from a new release to
> > recover.
> > > For example, users could detect a hanging transaction through the
> > existing
> > > "LastStableOffsetLag" metric and then collect the needed information
> > from a
> > > dump of the log (or producer snapshot). It's more work, but at least
> it's
> > > possible. Does that sound fair?
> > >
> > > Thanks,
> > > Jason
> > >
> > > On Wed, Aug 26, 2020 at 11:51 AM Ron Dagostino <rndg...@gmail.com>
> > wrote:
> > >
> > > > Hi Jason.  Thanks for the excellently-written KIP.
> > > >
> > > > Will the implementation be backported to prior Kafka versions?  The
> > > reason
> > > > I ask is because if it is not backported and similar functionality is
> > not
> > > > otherwise made available for older versions, then the only recourse
> > > (aside
> > > > from deleting and recreating the topic as you pointed out) may be to
> > > > upgrade to 2.7 (or whatever version ends up getting this
> > functionality).
> > > > Such an upgrade may not be desirable, especially if the number of
> > > > intermediate versions is considerable. I understand the mantra of
> > "never
> > > > fall too many versions behind" but the reality of it is that it isn't
> > > > always the case.  Even if the version is relatively recent, an
> upgrade
> > > may
> > > > still not be possible for some time, and a quicker resolution may be
> > > > necessary.
> > > >
> > > > Ron
> > > >
> > > > On Wed, Aug 26, 2020 at 2:33 PM Jason Gustafson <ja...@confluent.io>
> > > > wrote:
> > > >
> > > > > Hi All,
> > > > >
> > > > > I've added a proposal to handle the problem of hanging
> transactions:
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-664%3A+Provide+tooling+to+detect+and+abort+hanging+transactions
> > > > > .
> > > > > In theory, this should never happen. In practice, we have hit one
> bug
> > > > where
> > > > > it was possible and there are few good options today to recover.
> > Take a
> > > > > look and let me know what you think.
> > > > >
> > > > > Thanks,
> > > > > Jason
> > > > >
> > > >
> > >
> >
>

Reply via email to