>> Would it be worth returning transactional.id.expiration.ms in the DescribeProducersResponse?
> That's an interesting thought as well. Are you trying to avoid the need to specify it through the command line? The tool could also query the value with DescribeConfigs I suppose. Basically. I'm not sure how useful this will be in practice, though it might help when debugging. Lucas On Thu, Aug 27, 2020 at 11:00 AM Jason Gustafson <ja...@confluent.io> wrote: > Hey Lucas, > > Thanks for the comments. Responses below: > > > Given that it's possible for replica producer states to diverge from each > other, it would be very useful if DescribeProducers(Request,Response) and > tooling is able to query all partition replicas for their producers > > Yes, it makes sense to me to let DescribeProducers work on both followers > and leaders. In fact, I'm encouraged that there are use cases for this work > other than detecting hanging transactions. That was indeed the hope, but I > didn't have anything specific in mind. I will update the proposal. > > > Would it be worth returning transactional.id.expiration.ms in the > DescribeProducersResponse? > > That's an interesting thought as well. Are you trying to avoid the need to > specify it through the command line? The tool could also query the value > with DescribeConfigs I suppose. > > Thanks, > Jason > > On Thu, Aug 27, 2020 at 10:48 AM Lucas Bradstreet <lu...@confluent.io> > wrote: > > > Hi Jason, > > > > This looks like a very useful tool, thanks for writing it up. > > > > Given that it's possible for replica producer states to diverge from each > > other, it would be very useful if DescribeProducers(Request,Response) and > > tooling is able to query all partition replicas for their producers. One > > way I can see this being used immediately is in kafka's system tests, > > especially the ones that inject failures. At the end of the test we can > > query all replicas and make sure that their states have not diverged. I > can > > also see it being useful when debugging production clusters too. > > > > Would it be worth returning transactional.id.expiration.ms in the > > DescribeProducersResponse? > > > > Cheers, > > > > Lucas > > > > > > > > On Wed, Aug 26, 2020 at 12:12 PM Ron Dagostino <rndg...@gmail.com> > wrote: > > > > > Yes, that definitely sounds reasonable. Thanks, Jason! > > > > > > Ron > > > > > > On Wed, Aug 26, 2020 at 3:03 PM Jason Gustafson <ja...@confluent.io> > > > wrote: > > > > > > > Hey Ron, > > > > > > > > We do not typically backport new APIs to older versions. I think we > can > > > > however make the --abort command compatible with older versions. It > > would > > > > require a user to do some analysis on their own to identify a hanging > > > > transaction, but then they can use the tool from a new release to > > > recover. > > > > For example, users could detect a hanging transaction through the > > > existing > > > > "LastStableOffsetLag" metric and then collect the needed information > > > from a > > > > dump of the log (or producer snapshot). It's more work, but at least > > it's > > > > possible. Does that sound fair? > > > > > > > > Thanks, > > > > Jason > > > > > > > > On Wed, Aug 26, 2020 at 11:51 AM Ron Dagostino <rndg...@gmail.com> > > > wrote: > > > > > > > > > Hi Jason. Thanks for the excellently-written KIP. > > > > > > > > > > Will the implementation be backported to prior Kafka versions? The > > > > reason > > > > > I ask is because if it is not backported and similar functionality > is > > > not > > > > > otherwise made available for older versions, then the only recourse > > > > (aside > > > > > from deleting and recreating the topic as you pointed out) may be > to > > > > > upgrade to 2.7 (or whatever version ends up getting this > > > functionality). > > > > > Such an upgrade may not be desirable, especially if the number of > > > > > intermediate versions is considerable. I understand the mantra of > > > "never > > > > > fall too many versions behind" but the reality of it is that it > isn't > > > > > always the case. Even if the version is relatively recent, an > > upgrade > > > > may > > > > > still not be possible for some time, and a quicker resolution may > be > > > > > necessary. > > > > > > > > > > Ron > > > > > > > > > > On Wed, Aug 26, 2020 at 2:33 PM Jason Gustafson < > ja...@confluent.io> > > > > > wrote: > > > > > > > > > > > Hi All, > > > > > > > > > > > > I've added a proposal to handle the problem of hanging > > transactions: > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-664%3A+Provide+tooling+to+detect+and+abort+hanging+transactions > > > > > > . > > > > > > In theory, this should never happen. In practice, we have hit one > > bug > > > > > where > > > > > > it was possible and there are few good options today to recover. > > > Take a > > > > > > look and let me know what you think. > > > > > > > > > > > > Thanks, > > > > > > Jason > > > > > > > > > > > > > > > > > > > > >