Partition reassignment reversed

Andrew Jorgensen Mon, 01 Dec 2014 19:18:07 -0800

I unfortunately do not have any specific logs from these events but I will try 
and describe the events as accurately as possible to give an idea of the 
problem I saw.


The odd behavior manifested itself when I bounced all of the kafka processes on 
each of the servers in a 12 node cluster. A few weeks prior I did a partition 
reassignment to add four new kafka brokers to the cluster. This cluster has 4 
topics on it each with 350 partitions each, a retention policy of 6 hours, and 
a replication factor of 1. Originally I attempted to run a migration on all of 
the topics and partitions adding the 4 new nodes using the partition 
reassignment tool. This seemed to cause a lot of network congestion and 
according to the logs some of the nodes were having trouble talking to each 
other. The network congestion lasted for the duration of the migration and 
began to get better toward the end. After the migration I confirmed that data 
was being stored and served from the new brokers. Today I bounced each of the 
kafka processes on each of the brokers to pick up a change made to the log4j 
properties. After bouncing one processes I started seeing some strange errors 
on the four newer broker nodes that looked like:

kafka.common.NotAssignedReplicaException: Leader 10 failed to record follower 
7's position 0 for partition [topic-1,185] since the replica 7 is not 
recognized to be one of the assigned replicas 10 for partition [topic-2,185]

and on the older kafka brokers the errors looked like:

[2014-12-01 17:06:04,268] ERROR [ReplicaFetcherThread-0-12], Error for 
partition [topic-1,175] to broker 12:class kafka.common.UnknownException 
(kafka.server.ReplicaFetcherThread)

I proceeded to bounce the rest of the kafka processes and after bouncing the 
rest the errors seemed to stop. It wasn’t until a few hours later I noticed 
that the amount of data stored on the 4 new kafka brokers had dropped off 
significantly. When I ran a describe for the topics in the errors it was clear 
that the assigned partitions had been reverted to a state prior to the original 
migration to add the 4 new brokers. I am unsure of why bouncing the kafka 
process would cause the state in zookeeper to get overwritten given that it had 
seemed to have been working for the last few weeks until the process was 
restarted. My hunch is that the controller keeps some state about the world 
pre-reassignment and removes that state after it detects that the reassignment 
happened successfully. In this case the network congestion on each of the 
brokers caused the controller not to get notified when all the reassignments 
were completed and thus kept the pre-assignement state around. When the process 
was bounced it read from zookeeper to get this state and reverted the existing 
scheme to the pre-assignment state. Has this behavior been observed before? 
Does this sound like a logical understanding of what happened in this case?

-- 
Andrew Jorgensen
@ajorgensen

Partition reassignment reversed

Reply via email to