We hit an error in some custom monitoring code for our Kafka cluster where the root cause was zookeeper was storing for some partition offsets for consumer groups, but those partitions didn't actually exist on the brokers.
Apparently in the past, some colleagues needed to reset a stuck cluster caused by corrupted data. So they wiped out the data log files on disk for some topics, but didn't wipe the consumer offsets. In an ideal world this situation should never happen. However, things like this do happen in the real world. Couple of questions: 1) This is pretty easy to cleanup through the Zookeeper CLI, but how do you clean this up if we were instead storing offsets in Kafka? 2) From an operational perspective, I'm sure we're not the only ones to hit this, so I think there should be a simple command/script to clean this up that is a) packaged with Kafka, and b) documented. Does this currently exist? 3) I also think it'd be nice if Kafka automatically checked for this error case and logged a warning. I wouldn't want automatic cleaning, because if this situation occurs, something is screwy and I'd want to minimize what's changing while I tried to debug. Is this a reasonable request? Cheers, Jeff