Re: Partition reassignment reversed

Andrew Jorgensen Tue, 02 Dec 2014 14:36:07 -0800

I am using kafka 0.8.
Yes I did run —verify, but got some weird output from it I had never seen 
before that looked something like:

Status of partition reassignment:
ERROR: Assigned replicas (5,2) don't match the list of replicas for 
reassignment (5) for partition [topic-1,248]
ERROR: Assigned replicas (7,3) don't match the list of replicas for 
reassignment (7) for partition [topic-2,228]

There were a large number of these but it seems to just be for topic-1, and 
topic-2. In this case I was migrating around 4 or 5 topics. These two are also 
the ones that got reversed when I bounced all the processes yesterday.

Here are some more logs that I found from that day that may help piece together 
what might have happened

[2014-11-19 16:56:52,938] ERROR [KafkaApi-1] Error when processing fetch 
request for partition [topic-2,317] offset 408324093 from follower with 
correlation id 2458 (kafka.server.KafkaApis)
kafka.common.OffsetOutOfRangeException: Request for offset 408324093 but we 
only have log segments in the range 409018400 to 425346400.
at kafka.log.Log.read(Log.scala:380)
at 
kafka.server.KafkaApis.kafka$server$KafkaApis$$readMessageSet(KafkaApis.scala:530)
at 
kafka.server.KafkaApis$$anonfun$kafka$server$KafkaApis$$readMessageSets$1.apply(KafkaApis.scala:476)
at 
kafka.server.KafkaApis$$anonfun$kafka$server$KafkaApis$$readMessageSets$1.apply(KafkaApis.scala:471)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:233)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:233)
at scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:17
at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:347)
at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:347)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
at scala.collection.immutable.HashMap.map(HashMap.scala:3
at 
kafka.server.KafkaApis.kafka$server$KafkaApis$$readMessageSets(KafkaApis.scala:471)
at kafka.server.KafkaApis.handleFetchRequest(KafkaApis.scala:437)
at kafka.server.KafkaApis.handle(KafkaApis.scala:186)
at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:42)
at java.lang.Thread.run(Thread.java:724)

-----

[2014-11-19 16:24:37,959] ERROR Conditional update of path 
/brokers/topics/topic-2/partitions/248/state with data 
{"controller_epoch":15,"leader":2,"version":1,"leader_epoch":1,"isr":[2,5]} and 
expected version 1 failed due to 
org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = 
BadVersion for /brokers/topics/topic-2/partitions/248/state 
(kafka.utils.ZkUtils$)

-- 
Andrew Jorgensen
@ajorgensen

On December 2, 2014 at 5:28:07 PM, Jun Rao (jun...@gmail.com) wrote:

Did you run the --verify option (  
http://kafka.apache.org/documentation.html#basic_ops_restarting) to check  
if the reassignment process completes? Also, what version of Kafka are you  
using?  

Thanks,  

Jun  

On Mon, Dec 1, 2014 at 7:16 PM, Andrew Jorgensen <  
ajorgen...@twitter.com.invalid> wrote:  

> I unfortunately do not have any specific logs from these events but I will  
> try and describe the events as accurately as possible to give an idea of  
> the problem I saw.  
>  
> The odd behavior manifested itself when I bounced all of the kafka  
> processes on each of the servers in a 12 node cluster. A few weeks prior I  
> did a partition reassignment to add four new kafka brokers to the cluster.  
> This cluster has 4 topics on it each with 350 partitions each, a retention  
> policy of 6 hours, and a replication factor of 1. Originally I attempted to  
> run a migration on all of the topics and partitions adding the 4 new nodes  
> using the partition reassignment tool. This seemed to cause a lot of  
> network congestion and according to the logs some of the nodes were having  
> trouble talking to each other. The network congestion lasted for the  
> duration of the migration and began to get better toward the end. After the  
> migration I confirmed that data was being stored and served from the new  
> brokers. Today I bounced each of the kafka processes on each of the brokers  
> to pick up a change made to the log4j properties. After bouncing one  
> processes I started seeing some strange errors on the four newer broker  
> nodes that looked like:  
>  
> kafka.common.NotAssignedReplicaException: Leader 10 failed to record  
> follower 7's position 0 for partition [topic-1,185] since the replica 7 is  
> not recognized to be one of the assigned replicas 10 for partition  
> [topic-2,185]  
>  
> and on the older kafka brokers the errors looked like:  
>  
> [2014-12-01 17:06:04,268] ERROR [ReplicaFetcherThread-0-12], Error for  
> partition [topic-1,175] to broker 12:class kafka.common.UnknownException  
> (kafka.server.ReplicaFetcherThread)  
>  
> I proceeded to bounce the rest of the kafka processes and after bouncing  
> the rest the errors seemed to stop. It wasn’t until a few hours later I  
> noticed that the amount of data stored on the 4 new kafka brokers had  
> dropped off significantly. When I ran a describe for the topics in the  
> errors it was clear that the assigned partitions had been reverted to a  
> state prior to the original migration to add the 4 new brokers. I am unsure  
> of why bouncing the kafka process would cause the state in zookeeper to get  
> overwritten given that it had seemed to have been working for the last few  
> weeks until the process was restarted. My hunch is that the controller  
> keeps some state about the world pre-reassignment and removes that state  
> after it detects that the reassignment happened successfully. In this case  
> the network congestion on each of the brokers caused the controller not to  
> get notified when all the reassignments were completed and thus kept the  
> pre-assignement state around. When the process was bounced it read from  
> zookeeper to get this state and reverted the existing scheme to the  
> pre-assignment state. Has this behavior been observed before? Does this  
> sound like a logical understanding of what happened in this case?  
>  
> --  
> Andrew Jorgensen  
> @ajorgensen

Re: Partition reassignment reversed

Reply via email to