Hello All,

After performing an upgrade of our Kafka 0.8.0 to Kafka 0.8.1, we are receiving 
a failure in the preferred replica election process. I am not sure if this is a 
known issue or not. This is a two node Kafka cluster in our QA environment 
(replica 2) with a total of 2100+ partitions over 20 topics.


Upon the first run, we get the following output:

Performing Rebalancing Process...
2014-04-14 17:04:22 DEBUG zkclient.ZkConnection - Creating new ZookKeeper 
instance to connect to tvip-m1-mw-zookeeper.dish.com:2181.
2014-04-14 17:04:22 INFO zkclient.ZkEventThread - Starting ZkClient event 
thread.
2014-04-14 17:04:22 INFO zookeeper.ZooKeeper - Client 
environment:zookeeper.version=3.3.3-1203054, built on 11/17/2011 05:47 GMT
2014-04-14 17:04:22 INFO zookeeper.ZooKeeper - Client 
environment:host.name=tm1mwdpl05
2014-04-14 17:04:22 INFO zookeeper.ZooKeeper - Client 
environment:java.version=1.7.0_51
2014-04-14 17:04:22 INFO zookeeper.ZooKeeper - Client 
environment:java.vendor=Oracle Corporation
2014-04-14 17:04:22 INFO zookeeper.ZooKeeper - Client 
environment:java.home=/apps/des/jdk1.7.0_51/jre
2014-04-14 17:04:22 INFO zookeeper.ZooKeeper - Client 
environment:java.class.path=:/apps/des/kafka/bin/../core/build/dependant-libs-2.8.0/*.jar:/apps/des/kafka/bin/../perf/build/libs//kafka-perf_2.8.0*.jar:/apps/des/kafka/bin/../clients/build/libs//kafka-clients*.jar:/apps/des/kafka/bin/../examples/build/libs//kafka-examples*.jar:/apps/des/kafka/bin/../contrib/hadoop-consumer/build/libs//kafka-hadoop-consumer*.jar:/apps/des/kafka/bin/../contrib/hadoop-producer/build/libs//kafka-hadoop-producer*.jar:/apps/des/kafka/bin/../libs/jopt-simple-3.2.jar:/apps/des/kafka/bin/../libs/kafka_2.10-0.8.1.jar:/apps/des/kafka/bin/../libs/log4j-1.2.15.jar:/apps/des/kafka/bin/../libs/metrics-annotation-2.2.0.jar:/apps/des/kafka/bin/../libs/metrics-core-2.2.0.jar:/apps/des/kafka/bin/../libs/scala-library-2.10.1.jar:/apps/des/kafka/bin/../libs/slf4j-api-1.7.2.jar:/apps/des/kafka/bin/../libs/snappy-java-1.0.5.jar:/apps/des/kafka/bin/../libs/zkclient-0.3.jar:/apps/des/kafka/bin/../libs/zookeeper-3.3.4.jar:/apps/des/kafka/bin/../core/build/libs/kafka_2.8.0*.jar
2014-04-14 17:04:22 INFO zookeeper.ZooKeeper - Client 
environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
2014-04-14 17:04:22 INFO zookeeper.ZooKeeper - Client 
environment:java.io.tmpdir=/tmp
2014-04-14 17:04:22 INFO zookeeper.ZooKeeper - Client 
environment:java.compiler=<NA>
2014-04-14 17:04:22 INFO zookeeper.ZooKeeper - Client environment:os.name=Linux
2014-04-14 17:04:22 INFO zookeeper.ZooKeeper - Client environment:os.arch=amd64
2014-04-14 17:04:22 INFO zookeeper.ZooKeeper - Client 
environment:os.version=2.6.32-220.el6.x86_64
2014-04-14 17:04:22 INFO zookeeper.ZooKeeper - Client 
environment:user.name=desadmin
2014-04-14 17:04:22 INFO zookeeper.ZooKeeper - Client 
environment:user.home=/apps/des
2014-04-14 17:04:22 INFO zookeeper.ZooKeeper - Client 
environment:user.dir=/apps/des/bin
2014-04-14 17:04:22 INFO zookeeper.ZooKeeper - Initiating client connection, 
connectString=tvip-m1-mw-zookeeper.dish.com:2181 sessionTimeout=30000 
watcher=org.I0Itec.zkclient.ZkClient@77a63a20
2014-04-14 17:04:22 DEBUG zookeeper.ClientCnxn - 
zookeeper.disableAutoWatchReset is false
2014-04-14 17:04:22 DEBUG zkclient.ZkClient - Awaiting connection to Zookeeper 
server
2014-04-14 17:04:22 DEBUG zkclient.ZkClient - Waiting for keeper state 
SyncConnected
2014-04-14 17:04:22 INFO zookeeper.ClientCnxn - Opening socket connection to 
server tvip-m1-mw-zookeeper.dish.com/10.18.60.25:2181
2014-04-14 17:04:22 INFO zookeeper.ClientCnxn - Socket connection established 
to tvip-m1-mw-zookeeper.dish.com/10.18.60.25:2181, initiating session
2014-04-14 17:04:22 DEBUG zookeeper.ClientCnxn - Session establishment request 
sent on tvip-m1-mw-zookeeper.dish.com/10.18.60.25:2181
2014-04-14 17:04:22 INFO zookeeper.ClientCnxn - Session establishment complete 
on server tvip-m1-mw-zookeeper.dish.com/10.18.60.25:2181, sessionid = 
0x2453d08efcac215, negotiated timeout = 30000
2014-04-14 17:04:22 DEBUG zkclient.ZkClient - Received event: WatchedEvent 
state:SyncConnected type:None path:null
2014-04-14 17:04:22 INFO zkclient.ZkClient - zookeeper state changed 
(SyncConnected)
2014-04-14 17:04:22 DEBUG zkclient.ZkClient - Leaving process event
2014-04-14 17:04:22 DEBUG zkclient.ZkClient - State is SyncConnected
2014-04-14 17:04:22 DEBUG zookeeper.ClientCnxn - Reading reply 
sessionid:0x2453d08efcac215, packet:: clientPath:null serverPath:null 
finished:false header:: 1,8  replyHeader:: 1,73101951651,0  request:: 
'/brokers/topics,F  response:: 
v{'bill-generated,'bob_test_topic,'rain-burn-in,'account-adjustment-issued,'account-access,'customer-cues,'submit-agreement,'logtest_v3,'account-created,'rain-load-test,'outbound-call-attempted,'account-info-updated,'rain-enhanced-consumer,'threshold-exceeded,'account-activated,'rain-enhanced,'outbound-communications}
2014-04-14 17:04:22 DEBUG zookeeper.ClientCnxn - Reading reply 
sessionid:0x2453d08efcac215, packet:: clientPath:null serverPath:null 
finished:false header:: 2,8  replyHeader:: 2,73101951651,0  request:: 
'/brokers/topics/bill-generated/partitions,F  response:: 
v{'35,'36,'33,'34,'39,'37,'38,'43,'42,'41,'40,'22,'23,'24,'25,'26,'27,'28,'29,'3,'2,'1,'0,'30,'7,'6,'32,'5,'4,'31,'9,'8,'19,'17,'18,'15,'16,'13,'14,'11,'12,'21,'20,'109,'108,'107,'106,'105,'104,'103,'99,'102,'101,'100,'98,'97,'96,'95,'94,'93,'92,'91,'90,'10,'88,'89,'116,'79,'117,'114,'78,'77,'115,'112,'113,'110,'111,'118,'119,'82,'83,'80,'81,'86,'87,'84,'85,'67,'125,'126,'66,'69,'127,'68,'128,'121,'122,'123,'124,'129,'70,'71,'120,'72,'73,'74,'75,'76,'59,'58,'57,'56,'55,'64,'65,'62,'63,'60,'61,'130,'49,'48,'45,'44,'47,'46,'51,'52,'53,'54,'50}
2014-04-14 17:04:22 DEBUG zookeeper.ClientCnxn - Reading reply 
sessionid:0x2453d08efcac215, packet:: clientPath:null serverPath:null 
finished:false header:: 3,8  replyHeader:: 3,73101951651,0  request:: 
'/brokers/topics/bob_test_topic/partitions,F  response:: v{'3,'2,'1,'0,'6,'5,'4}



2014-04-14 17:05:11 DEBUG zookeeper.ClientCnxn - Reading reply 
sessionid:0x2453d08efcac215, packet:: clientPath:null serverPath:null 
finished:false header::
2122,1  replyHeader:: 2122,73101958104,0  request:: 
'/admin/preferred_replica_election,#7b2276657273696f6e

*big list of hex values*

6974696f6e223a35317d2c7b22746f706963223a2262696c6c2d67656e657261
746564222c22706172746974696f6e223a34377d2c7b22746f706963223a226c6f67746573745f7633222c22706172746974696f6e223a377d2c7b22746f706963223a227261696e2d656e68616e6
365642d636f6e73756d6572222c22706172746974696f6e223a357d5d7d,v{s{31,s{'world,'anyone}}},0
  response:: '/admin/preferred_replica_election

2014-04-14 17:05:11 INFO admin.PreferredReplicaLeaderElectionCommand$ - Created 
preferred replica election path with {"version":1,"partitions":[{"topic":"acc
ount-info-updated","partition":36},{"topic":"outbound-communications","partition":39},{"topic":"account-access","partition":107},{"topic":"account-created","
partition":8},{"topic":"rain-burn-in","partition":46}

*huge list of topics and partitions*

[submit-agreement,31])
2014-04-14 17:05:11 DEBUG zkclient.ZkClient - Closing ZkClient...
2014-04-14 17:05:11 INFO zkclient.ZkEventThread - Terminate ZkClient event 
thread.
2014-04-14 17:05:11 DEBUG zkclient.ZkConnection - Closing ZooKeeper connected 
to tvip-m1-mw-zookeeper.dish.com:2181
2014-04-14 17:05:11 DEBUG zookeeper.ZooKeeper - Closing session: 
0x2453d08efcac215
2014-04-14 17:05:11 DEBUG zookeeper.ClientCnxn - Closing client for session: 
0x2453d08efcac215
2014-04-14 17:05:11 DEBUG zookeeper.ClientCnxn - Reading reply 
sessionid:0x2453d08efcac215, packet:: clientPath:null serverPath:null 
finished:false header:: 2123,-11  replyHeader:: 2123,73101958105,0  request:: 
null response:: null
2014-04-14 17:05:11 DEBUG zookeeper.ClientCnxn - Disconnecting client for 
session: 0x2453d08efcac215
2014-04-14 17:05:11 INFO zookeeper.ZooKeeper - Session: 0x2453d08efcac215 closed
2014-04-14 17:05:11 DEBUG zkclient.ZkClient - Closing ZkClient...done
2014-04-14 17:05:11 DEBUG zookeeper.ClientCnxn - An exception was thrown while 
closing send thread for session 0x2453d08efcac215 : Unable to read additional 
data from server sessionid 0x2453d08efcac215, likely server has closed socket
2014-04-14 17:05:11 INFO zookeeper.ClientCnxn - EventThread shut down


If I perform a topic list, I will see the partitions balanced between the 
brokers. However if I check in zookeeper, I see the following admin node still 
existing:

[zk: tm1-zookeeper101:2181(CONNECTED) 0] get /admin/preferred_replica_election

{"version":1,"partitions":[{"topic":"account-info-updated","partition":36},

*large list of topic and partitions*

,{"topic":"rain-enhanced-consumer","partition":5}]}
cZxid = 0x1105375bd8
ctime = Mon Apr 14 17:05:11 MDT 2014
mZxid = 0x1105375bd8
mtime = Mon Apr 14 17:05:11 MDT 2014
pZxid = 0x1105375bd8
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 94578
numChildren = 0

It appears that node was not removed after the election occurred. It looks like 
this is taking over 30 seconds. Perhaps this process is taking over 30 seconds 
as defined in the zookeeper session timeout?

If I re-run the election (with the admin node still in place), I get the 
following error:


Performing Rebalancing Process...
2014-04-14 17:18:13 DEBUG zkclient.ZkConnection - Creating new ZookKeeper 
instance to connect to tvip-m1-mw-zookeeper.dish.com:2181.
2014-04-14 17:18:13 INFO zkclient.ZkEventThread - Starting ZkClient event 
thread.
2014-04-14 17:18:13 INFO zookeeper.ZooKeeper - Client 
environment:zookeeper.version=3.3.3-1203054, built on 11/17/2011 05:47 GMT
2014-04-14 17:18:13 INFO zookeeper.ZooKeeper - Client 
environment:host.name=tm1mwdpl05
2014-04-14 17:18:13 INFO zookeeper.ZooKeeper - Client 
environment:java.version=1.7.0_51
2014-04-14 17:18:13 INFO zookeeper.ZooKeeper - Client 
environment:java.vendor=Oracle Corporation
2014-04-14 17:18:13 INFO zookeeper.ZooKeeper - Client 
environment:java.home=/apps/des/jdk1.7.0_51/jre
2014-04-14 17:18:13 INFO zookeeper.ZooKeeper - Client 
environment:java.class.path=:/apps/des/kafka/bin/../core/build/dependant-libs-2.8.0/*.jar:/apps/des/kafka/bin/../perf/build/libs//kafka-perf_2.8.0*.jar:/apps/des/kafka/bin/../clients/build/libs//kafka-clients*.jar:/apps/des/kafka/bin/../examples/build/libs//kafka-examples*.jar:/apps/des/kafka/bin/../contrib/hadoop-consumer/build/libs//kafka-hadoop-consumer*.jar:/apps/des/kafka/bin/../contrib/hadoop-producer/build/libs//kafka-hadoop-producer*.jar:/apps/des/kafka/bin/../libs/jopt-simple-3.2.jar:/apps/des/kafka/bin/../libs/kafka_2.10-0.8.1.jar:/apps/des/kafka/bin/../libs/log4j-1.2.15.jar:/apps/des/kafka/bin/../libs/metrics-annotation-2.2.0.jar:/apps/des/kafka/bin/../libs/metrics-core-2.2.0.jar:/apps/des/kafka/bin/../libs/scala-library-2.10.1.jar:/apps/des/kafka/bin/../libs/slf4j-api-1.7.2.jar:/apps/des/kafka/bin/../libs/snappy-java-1.0.5.jar:/apps/des/kafka/bin/../libs/zkclient-0.3.jar:/apps/des/kafka/bin/../libs/zookeeper-3.3.4.jar:/apps/des/kafka/bin/../core/build/libs/kafka_2.8.0*.jar
2014-04-14 17:18:13 INFO zookeeper.ZooKeeper - Client 
environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
2014-04-14 17:18:13 INFO zookeeper.ZooKeeper - Client 
environment:java.io.tmpdir=/tmp
2014-04-14 17:18:13 INFO zookeeper.ZooKeeper - Client 
environment:java.compiler=<NA>
2014-04-14 17:18:13 INFO zookeeper.ZooKeeper - Client environment:os.name=Linux
2014-04-14 17:18:13 INFO zookeeper.ZooKeeper - Client environment:os.arch=amd64
2014-04-14 17:18:13 INFO zookeeper.ZooKeeper - Client 
environment:os.version=2.6.32-220.el6.x86_64
2014-04-14 17:18:13 INFO zookeeper.ZooKeeper - Client 
environment:user.name=desadmin
2014-04-14 17:18:13 INFO zookeeper.ZooKeeper - Client 
environment:user.home=/apps/des
2014-04-14 17:18:13 INFO zookeeper.ZooKeeper - Client 
environment:user.dir=/apps/des/bin
2014-04-14 17:18:13 INFO zookeeper.ZooKeeper - Initiating client connection, 
connectString=tvip-m1-mw-zookeeper.dish.com:2181 sessionTimeout=30000 
watcher=org.I0Itec.zkclient.ZkClient@73250065
2014-04-14 17:18:13 DEBUG zookeeper.ClientCnxn - 
zookeeper.disableAutoWatchReset is false
2014-04-14 17:18:13 DEBUG zkclient.ZkClient - Awaiting connection to Zookeeper 
server
2014-04-14 17:18:13 DEBUG zkclient.ZkClient - Waiting for keeper state 
SyncConnected
2014-04-14 17:18:13 INFO zookeeper.ClientCnxn - Opening socket connection to 
server tvip-m1-mw-zookeeper.dish.com/10.18.60.25:2181
2014-04-14 17:18:13 INFO zookeeper.ClientCnxn - Socket connection established 
to tvip-m1-mw-zookeeper.dish.com/10.18.60.25:2181, initiating session
2014-04-14 17:18:13 DEBUG zookeeper.ClientCnxn - Session establishment request 
sent on tvip-m1-mw-zookeeper.dish.com/10.18.60.25:2181
2014-04-14 17:18:13 INFO zookeeper.ClientCnxn - Session establishment complete 
on server tvip-m1-mw-zookeeper.dish.com/10.18.60.25:2181, sessionid = 
0x1453d08efa2c2b7, negotiated timeout = 30000
2014-04-14 17:18:13 DEBUG zkclient.ZkClient - Received event: WatchedEvent 
state:SyncConnected type:None path:null
2014-04-14 17:18:13 INFO zkclient.ZkClient - zookeeper state changed 
(SyncConnected)
2014-04-14 17:18:13 DEBUG zkclient.ZkClient - Leaving process event
2014-04-14 17:18:13 DEBUG zkclient.ZkClient - State is SyncConnected
2014-04-14 17:18:13 DEBUG zookeeper.ClientCnxn - Reading reply 
sessionid:0x1453d08efa2c2b7, packet:: clientPath:null serverPath:null 
finished:false header:: 1,8  replyHeader:: 1,73102061236,0  request:: 
'/brokers/topics,F  response:: 
v{'bill-generated,'bob_test_topic,'rain-burn-in,'account-adjustment-issued,'account-access,'customer-cues,'submit-agreement,'logtest_v3,'account-created,'rain-load-test,'outbound-call-attempted,'account-info-updated,'threshold-exceeded,'rain-enhanced-consumer,'account-activated,'outbound-communications,'rain-enhanced}
2014-04-14 17:18:13 DEBUG zookeeper.ClientCnxn - Reading reply 
sessionid:0x1453d08efa2c2b7, packet:: clientPath:null serverPath:null 
finished:false header:: 2,8  replyHeader:: 2,73102061261,0  request:: 
'/brokers/topics/bill-generated/partitions,F  response:: 
v{'35,'36,'33,'34,'39,'37,'38,'43,'42,'41,'40,'22,'23,'24,'25,'26,'27,'28,'29,'3,'2,'1,'0,'7,'30,'6,'5,'32,'31,'4,'9,'8,'19,'17,'18,'15,'16,'13,'14,'11,'12,'21,'20,'109,'108,'107,'106,'105,'104,'103,'99,'102,'101,'100,'98,'97,'96,'95,'94,'93,'92,'91,'90,'10,'88,'89,'116,'79,'117,'114,'78,'77,'115,'112,'113,'110,'111,'118,'119,'82,'83,'80,'81,'86,'87,'84,'85,'67,'125,'126,'66,'69,'127,'68,'128,'121,'122,'123,'124,'129,'70,'120,'71,'72,'73,'74,'75,'76,'59,'58,'57,'56,'55,'64,'65,'62,'63,'60,'130,'61,'49,'48,'45,'44,'47,'46,'51,'52,'53,'54,'50}

*large list of logging statements*


Failed to start preferred replica election
kafka.common.AdminCommandFailedException: Admin command failed
        at 
kafka.admin.PreferredReplicaLeaderElectionCommand.moveLeaderToPreferredReplica(PreferredReplicaLeaderElectionCommand.scala:115)
        at 
kafka.admin.PreferredReplicaLeaderElectionCommand$.main(PreferredReplicaLeaderElectionCommand.scala:60)
        at 
kafka.admin.PreferredReplicaLeaderElectionCommand.main(PreferredReplicaLeaderElectionCommand.scala)
Caused by: kafka.admin.AdminOperationException: Preferred replica leader 
election currently in progress for Set([account-adjustment-issued,77],

*large list of topics and partitions*


[submit-agreement,31]). Aborting operation
        at 
kafka.admin.PreferredReplicaLeaderElectionCommand$.writePreferredReplicaElectionData(PreferredReplicaLeaderElectionCommand.scala:101)
        at 
kafka.admin.PreferredReplicaLeaderElectionCommand.moveLeaderToPreferredReplica(PreferredReplicaLeaderElectionCommand.scala:113)
        ... 2 more

2014-04-14 17:19:00 DEBUG zkclient.ZkClient - Closing ZkClient...
2014-04-14 17:19:00 INFO zkclient.ZkEventThread - Terminate ZkClient event 
thread.
2014-04-14 17:19:00 DEBUG zkclient.ZkConnection - Closing ZooKeeper connected 
to tvip-m1-mw-zookeeper.dish.com:2181
2014-04-14 17:19:00 DEBUG zookeeper.ZooKeeper - Closing session: 
0x1453d08efa2c2b7
2014-04-14 17:19:00 DEBUG zookeeper.ClientCnxn - Closing client for session: 
0x1453d08efa2c2b7
2014-04-14 17:19:00 DEBUG zookeeper.ClientCnxn - Reading reply 
sessionid:0x1453d08efa2c2b7, packet:: clientPath:null serverPath:null 
finished:false header:: 2124,-11  replyHeader:: 2124,73102067456,0  request:: 
null response:: null
2014-04-14 17:19:00 DEBUG zookeeper.ClientCnxn - Disconnecting client for 
session: 0x1453d08efa2c2b7
2014-04-14 17:19:00 INFO zookeeper.ZooKeeper - Session: 0x1453d08efa2c2b7 closed
2014-04-14 17:19:00 DEBUG zookeeper.ClientCnxn - An exception was thrown while 
closing send thread for session 0x1453d08efa2c2b7 : Unable to read additional 
data from server sessionid 0x1453d08efa2c2b7, likely server has closed socket
2014-04-14 17:19:00 INFO zookeeper.ClientCnxn - EventThread shut down
2014-04-14 17:19:00 DEBUG zkclient.ZkClient - Closing ZkClient...done


If I manually remove the /admin/preferred_replica_election node, the java 
exception goes away.

A bit of other information:

At worst ParNew GC time: 25-36 ms (total time per 15 second interval, 1-2  GCs 
per 15 second interval) per Broker during this operation. No CMS GCing 
occurring.


Any ideas?

Thanks

/bob

Reply via email to