[ https://issues.apache.org/jira/browse/KAFKA-559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13703007#comment-13703007 ]
Joel Koshy commented on KAFKA-559: ---------------------------------- Thanks for the patch. Overall, looks good. Couple of comments, mostly minor in no particular order: * I think dry-run does not need any further qualifier such as withOptionalArg, describedAs, ofType - it's just a flag. * For --since, I prefer the seconds since epoch over some fixed input format which then brings in ambiguity such as timezone 24h vs 12h, etc. A better alternative would be to accept date strings and use the DateFormat class with lenient parsing turned on or something like that. --before may be more intuitive than --since. * Can use CommandLineUtils.checkRequiredArgs * deleteBy matching - prefer to use case match and thereby avoid the explicit check on valid values. Also the message on invalid value of deleteBy should inform what the valid values are. * Right now you support the following modes: delete stale topics across all groups, delete stale topics in a specific group. I think it would be useful to make deleteBy optional - if unspecified, it scans all groups and gets rid of stale groups. * line 75: warn ("msg", e) * line 101: should provide a reason for aborting * line 110: doesn't gropudirs have an offset path? if not maybe we should add it * Logging should include last mtime as that may be useful information reported by the dry-run * No need to add a wrapper shell script for the tool. * make all of the methods except main private. * The return statements can be dropped - i.e., just write the return value. * Several vars can be vals instead. * removeBrokerPartitionpairs: I don't think you would want to do a partial delete under a topic directory. You can check that all the partition offset paths are <= since and if so, just delete the topic path. With that the method would be better named something like deleteUnconsumedTopicsFromGroup? * Finally, you are probably aware that there are a bunch of race conditions - e.g., checkIfLiveConsumers is a helpful check to have but not guaranteed to be correct as some consumers may creep in while the tool is running. However, I think it is reasonable for a tool like this to ignore that since a "since" value way back would mean the probability of that occuring is very low. Similar note for deleteGroupIfNoTopicExists. > Garbage collect old consumer metadata entries > --------------------------------------------- > > Key: KAFKA-559 > URL: https://issues.apache.org/jira/browse/KAFKA-559 > Project: Kafka > Issue Type: New Feature > Reporter: Jay Kreps > Assignee: Tejas Patil > Labels: project > Attachments: KAFKA-559.v1.patch, KAFKA-559.v2.patch > > > Many use cases involve tranient consumers. These consumers create entries > under their consumer group in zk and maintain offsets there as well. There is > currently no way to delete these entries. It would be good to have a tool > that did something like > bin/delete-obsolete-consumer-groups.sh [--topic t1] --since [date] > --zookeeper [zk_connect] > This would scan through consumer group entries and delete any that had no > offset update since the given date. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira