Hi all! I recently had a problem where one out of two of my brokers would not reboot due to a hardware failure. The broker was down for almost a week before the required part came in and was fixed by our datacenter tech. During that time, the live broker was able to handle all messages for all topics and partitions (which is awesome!). The first broker is now back, and is trying to catch up with the messages that it missed for the during. The lower volume topics are all caught up, but I have one high volume topic (around 40K msgs/sec) that is taking much longer. I just took a few samples of Replica-MaxLag to see how long it would take to catch up. Currently, it is behind about 12.5 million messages and is catching up at a rate of about 1600 msgs/sec. At that rate, it’ll take around 9 days before the replica is caught up to the leader.
Is there any way to speed this up? Or, alternatively, I don’t actually care about this topic’s history. It is a new topic, and I know that it doesn't yet have any consumers. I’d be fine with instructing both brokers to drop old logs and just start from the top of the log. I could do this by manually deleting the topic (kafka data files and in zookeeper), but to do so properly with 0.8.0 I think I’d have to shut down the whole cluster, correct? I’d rather not do this, as another topic does have a consumer and I don’t want to lose messages for it. Thanks! -Andrew Otto