We hit an error in some custom monitoring code for our Kafka cluster where
the root cause was zookeeper was storing for some partition offsets for
consumer groups, but those partitions didn't actually exist on the brokers.

Apparently in the past, some colleagues needed to reset a stuck cluster
caused by corrupted data. So they wiped out the data log files on disk for
some topics, but didn't wipe the consumer offsets.

In an ideal world this situation should never happen. However, things like
this do happen in the real world.

Couple of questions:
1) This is pretty easy to cleanup through the Zookeeper CLI, but how do you
clean this up if we were instead storing offsets in Kafka?

2) From an operational perspective, I'm sure we're not the only ones to hit
this, so I think there should be a simple command/script to clean this up
that is a) packaged with Kafka, and b) documented. Does this currently
exist?

3) I also think it'd be nice if Kafka automatically checked for this error
case and logged a warning. I wouldn't want automatic cleaning, because if
this situation occurs, something is screwy and I'd want to minimize what's
changing while I tried to debug. Is this a reasonable request?

Cheers,
Jeff

Reply via email to