Hello James, We received this exact same error this past Tuesday (we are on 0.8.2). To answer at least one of your bullet points -- this is a valid scenario. We had the same questions, I'm starting to think this is a bug -- thank you for the reproducing steps!
I looked over the Release Notes to see if maybe there were some fixes in newer versions -- this bug fix looked the most related: https://issues.apache.org/jira/browse/KAFKA-2143 Thank you, Tony On Thu, Feb 25, 2016 at 3:46 PM, James Cheng <jch...@tivo.com> wrote: > Hi, > > I ran into a scenario where one of my brokers would continually shutdown, > with the error message: > [2016-02-25 00:29:39,236] FATAL [ReplicaFetcherThread-0-1], Halting > because log truncation is not allowed for topic test, Current leader 1's > latest offset 0 is less than replica 2's latest offset 151 > (kafka.server.ReplicaFetcherThread) > > I managed to reproduce it with the following scenario: > 1. Start broker1, with unclean.leader.election.enable=false > 2. Start broker2, with unclean.leader.election.enable=false > > 3. Create topic, single partition, with replication-factor 2. > 4. Write data to the topic. > > 5. At this point, both brokers are in the ISR. Broker1 is the partition > leader. > > 6. Ctrl-Z on broker2. (Simulates a GC pause or a slow network) Broker2 > gets dropped out of ISR. Broker1 is still the leader. I can still write > data to the partition. > > 7. Shutdown Broker1. Hard or controlled, doesn't matter. > > 8. rm -rf the log directory of broker1. (This simulates a disk replacement > or full hardware replacement) > > 9. Resume broker2. It attempts to connect to broker1, but doesn't succeed > because broker1 is down. At this point, the partition is offline. Can't > write to it. > > 10. Resume broker1. Broker1 resumes leadership of the topic. Broker2 > attempts to join ISR, and immediately halts with the error message: > [2016-02-25 00:29:39,236] FATAL [ReplicaFetcherThread-0-1], Halting > because log truncation is not allowed for topic test, Current leader 1's > latest offset 0 is less than replica 2's latest offset 151 > (kafka.server.ReplicaFetcherThread) > > I am able to recover by setting unclean.leader.election.enable=true on my > brokers. > > I'm trying to understand a couple things: > * Is my scenario a valid supported one, or is it along the lines of "don't > ever do that"? > * In step 10, why is broker1 allowed to resume leadership even though it > has no data? > * In step 10, why is it necessary to stop the entire broker due to one > partition that is in this state? Wouldn't it be possible for the broker to > continue to serve traffic for all the other topics, and just mark this one > as unavailable? > * Would it make sense to allow an operator to manually specify which > broker they want to become the new master? This would give me more control > over how much data loss I am willing to handle. In this case, I would want > broker2 to become the new master. Or, is that possible and I just don't > know how to do it? > * Would it be possible to make unclean.leader.election.enable to be a > per-topic configuration? This would let me control how much data loss I am > willing to handle. > > Btw, the comment in the source code for that error message indicates: > > https://github.com/apache/kafka/blob/01aeea7c7bca34f1edce40116b7721335938b13b/core/src/main/scala/kafka/server/ReplicaFetcherThread.scala#L164-L166 > > // Prior to truncating the follower's log, ensure that doing so is > not disallowed by the configuration for unclean leader election. > // This situation could only happen if the unclean election > configuration for a topic changes while a replica is down. Otherwise, > // we should never encounter this situation since a non-ISR leader > cannot be elected if disallowed by the broker configuration. > > But I don't believe that happened. I never changed the configuration. But > I did venture into "unclean leader election" territory, so I'm not sure if > the comment still applies. > > Thanks, > -James > > > > ________________________________ > > This email and any attachments may contain confidential and privileged > material for the sole use of the intended recipient. Any review, copying, > or distribution of this email (or any attachments) by others is prohibited. > If you are not the intended recipient, please contact the sender > immediately and permanently delete this email and any attachments. No > employee or agent of TiVo Inc. is authorized to conclude any binding > agreement on behalf of TiVo Inc. by email. Binding agreements with TiVo > Inc. may only be made by a signed written agreement. >