Ch3eck, check.  5 by 5!

> -----Original Message-----
> From: Jun Rao [mailto:jun...@gmail.com]
> Sent: Tuesday, April 15, 2014 10:06 PM
> To: users@kafka.apache.org
> Subject: Re: Kafka upgrade 0.8.0 to 0.8.1 - kafka-preferred-replica-election
> failure
> 
> Ok, we take a look and patch it in 0.8.1.1.
> 
> Thanks,
> 
> Jun
> 
> 
> On Tue, Apr 15, 2014 at 9:00 AM, Bello, Bob <bob.be...@dish.com> wrote:
> 
> > I performed another test. I build a single Kafka 0.8.1 Broker with a
> > single ZK instance. Brand new, no topics.
> >
> > Upon start up of the Kafka broker, the zookeeper /admin node only
> > contains "/admin/delete_topics".
> >
> > Even without creating a topic, I run a perfered replica election, and
> > it's successful. After the run, I check the node for
> > /admin/preferred_replica_election and it exists.
> >
> > [zk: tm1mwwm001:2181(CONNECTED) 0] get
> > /admin/preferred_replica_election {"version":1,"partitions":[]} cZxid
> > = 0x19 ctime = Tue Apr 15 09:53:22 MDT 2014 mZxid = 0x19 mtime = Tue
> > Apr 15 09:53:22 MDT 2014 pZxid = 0x19 cversion = 0 dataVersion = 0
> > aclVersion = 0 ephemeralOwner = 0x0 dataLength = 29 numChildren = 0
> >
> > If I run the election again, I get the following error (the same error
> > as my original post).
> >
> >
> > Failed to start preferred replica election
> > kafka.common.AdminCommandFailedException: Admin command failed
> >         at
> >
> kafka.admin.PreferredReplicaLeaderElectionCommand.moveLeaderToPrefer
> redReplica(PreferredReplicaLeaderElectionCommand.scala:115)
> >         at
> >
> kafka.admin.PreferredReplicaLeaderElectionCommand$.main(PreferredRepl
> icaLeaderElectionCommand.scala:60)
> >         at
> >
> kafka.admin.PreferredReplicaLeaderElectionCommand.main(PreferredReplic
> > aLeaderElectionCommand.scala) Caused by:
> > kafka.admin.AdminOperationException: Preferred replica leader election
> > currently in progress for Set(). Aborting operation
> >         at
> >
> kafka.admin.PreferredReplicaLeaderElectionCommand$.writePreferredRepli
> caElectionData(PreferredReplicaLeaderElectionCommand.scala:101)
> >         at
> >
> kafka.admin.PreferredReplicaLeaderElectionCommand.moveLeaderToPrefer
> redReplica(PreferredReplicaLeaderElectionCommand.scala:113)
> >         ... 2 more
> >
> >
> > The zookeeper log shows the following:
> >
> > 2014-04-15 09:53:44 INFO server.PrepRequestProcessor - Got user-level
> > KeeperException when processing sessionid:0x1456615e7770006
> > type:create
> > cxid:0x2 zxid:0x1e txntype:-1 reqpath:n/a Error
> > Path:/admin/preferred_replica_election Error:KeeperErrorCode =
> > NodeExists for /admin/preferred_replica_election
> >
> > It looks like the preferred election process is not removing the node
> > after completion.
> >
> >
> >
> > Bob Bello
> > Middleware Applications Administrator
> > Direct: 720-514-6605
> > email: bob.be...@dish.com
> >
> >
> > -----Original Message-----
> > From: Bello, Bob
> > Sent: Tuesday, April 15, 2014 8:47 AM
> > To: users@kafka.apache.org
> > Cc: Bello, Bob
> > Subject: RE: Kafka upgrade 0.8.0 to 0.8.1 -
> > kafka-preferred-replica-election failure
> >
> > Yes. I checked that I was fully replicated. All topic/partitions
> > leaders were on broker #2.
> >
> > Using the "--under-replicated-partitions" option in the listtopic
> > command returned no results.
> >
> >
> > Bob Bello
> > Middleware Applications Administrator
> > Direct: 720-514-6605
> > email: bob.be...@dish.com
> >
> > -----Original Message-----
> > From: Jun Rao [mailto:jun...@gmail.com]
> > Sent: Monday, April 14, 2014 9:11 PM
> > To: users@kafka.apache.org
> > Subject: Re: Kafka upgrade 0.8.0 to 0.8.1 -
> > kafka-preferred-replica-election failure
> >
> > Did you make sure there were no under replicated partitions before
> > issuing the preferred leader election?
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Mon, Apr 14, 2014 at 4:34 PM, Bello, Bob <bob.be...@dish.com> wrote:
> >
> > > Hello All,
> > >
> > > After performing an upgrade of our Kafka 0.8.0 to Kafka 0.8.1, we
> > > are receiving a failure in the preferred replica election process. I
> > > am not sure if this is a known issue or not. This is a two node
> > > Kafka cluster in our QA environment (replica 2) with a total of
> > > 2100+ partitions over 20 topics.
> > >
> > >
> > > Upon the first run, we get the following output:
> > >
> > > Performing Rebalancing Process...
> > > 2014-04-14 17:04:22 DEBUG zkclient.ZkConnection - Creating new
> > > ZookKeeper instance to connect to tvip-m1-mw-
> zookeeper.dish.com:2181.
> > > 2014-04-14 17:04:22 INFO zkclient.ZkEventThread - Starting ZkClient
> > > event thread.
> > > 2014-04-14 17:04:22 INFO zookeeper.ZooKeeper - Client
> > > environment:zookeeper.version=3.3.3-1203054, built on 11/17/2011
> > > 05:47
> > GMT
> > > 2014-04-14 17:04:22 INFO zookeeper.ZooKeeper - Client environment:
> > > host.name=tm1mwdpl05
> > > 2014-04-14 17:04:22 INFO zookeeper.ZooKeeper - Client
> > > environment:java.version=1.7.0_51
> > > 2014-04-14 17:04:22 INFO zookeeper.ZooKeeper - Client
> > > environment:java.vendor=Oracle Corporation
> > > 2014-04-14 17:04:22 INFO zookeeper.ZooKeeper - Client
> > > environment:java.home=/apps/des/jdk1.7.0_51/jre
> > > 2014-04-14 17:04:22 INFO zookeeper.ZooKeeper - Client
> > >
> > environment:java.class.path=:/apps/des/kafka/bin/../core/build/dependa
> > nt-libs-2.8.0/*.jar:/apps/des/kafka/bin/../perf/build/libs//kafka-perf
> > _2.8.0*.jar:/apps/des/kafka/bin/../clients/build/libs//kafka-clients*.
> > jar:/apps/des/kafka/bin/../examples/build/libs//kafka-examples*.jar:/a
> > pps/des/kafka/bin/../contrib/hadoop-consumer/build/libs//kafka-
> hadoop-
> > consumer*.jar:/apps/des/kafka/bin/../contrib/hadoop-producer/build/lib
> > s//kafka-hadoop-producer*.jar:/apps/des/kafka/bin/../libs/jopt-simple-
> > 3.2.jar:/apps/des/kafka/bin/../libs/kafka_2.10-0.8.1.jar:/apps/des/kaf
> > ka/bin/../libs/log4j-1.2.15.jar:/apps/des/kafka/bin/../libs/metrics-an
> > notation-2.2.0.jar:/apps/des/kafka/bin/../libs/metrics-core-2.2.0.jar:
> > /apps/des/kafka/bin/../libs/scala-library-2.10.1.jar:/apps/des/kafka/b
> > in/../libs/slf4j-api-1.7.2.jar:/apps/des/kafka/bin/../libs/snappy-java
> > -1.0.5.jar:/apps/des/kafka/bin/../libs/zkclient-0.3.jar:/apps/des/kafk
> > a/bin/../libs/zookeeper-3.3.4.jar:/apps/des/kafka/bin/../core/build/li
> > bs/kafka_2.8.0*.jar
> > > 2014-04-14 17:04:22 INFO zookeeper.ZooKeeper - Client
> > >
> > environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:
> > /lib64:/lib:/usr/lib
> > > 2014-04-14 17:04:22 INFO zookeeper.ZooKeeper - Client
> > > environment:java.io.tmpdir=/tmp
> > > 2014-04-14 17:04:22 INFO zookeeper.ZooKeeper - Client
> > > environment:java.compiler=<NA>
> > > 2014-04-14 17:04:22 INFO zookeeper.ZooKeeper - Client environment:
> > os.name
> > > =Linux
> > > 2014-04-14 17:04:22 INFO zookeeper.ZooKeeper - Client
> > > environment:os.arch=amd64
> > > 2014-04-14 17:04:22 INFO zookeeper.ZooKeeper - Client
> > > environment:os.version=2.6.32-220.el6.x86_64
> > > 2014-04-14 17:04:22 INFO zookeeper.ZooKeeper - Client environment:
> > > user.name=desadmin
> > > 2014-04-14 17:04:22 INFO zookeeper.ZooKeeper - Client
> > > environment:user.home=/apps/des
> > > 2014-04-14 17:04:22 INFO zookeeper.ZooKeeper - Client
> > > environment:user.dir=/apps/des/bin
> > > 2014-04-14 17:04:22 INFO zookeeper.ZooKeeper - Initiating client
> > > connection,
> > > connectString=tvip-m1-mw-
> zookeeper.dish.com:2181sessionTimeout=30000
> > watcher=org.I0Itec.zkclient.ZkClient@77a63a20
> > > 2014-04-14 17:04:22 DEBUG zookeeper.ClientCnxn -
> > > zookeeper.disableAutoWatchReset is false
> > > 2014-04-14 17:04:22 DEBUG zkclient.ZkClient - Awaiting connection to
> > > Zookeeper server
> > > 2014-04-14 17:04:22 DEBUG zkclient.ZkClient - Waiting for keeper
> > > state SyncConnected
> > > 2014-04-14 17:04:22 INFO zookeeper.ClientCnxn - Opening socket
> > > connection to server tvip-m1-mw-zookeeper.dish.com/10.18.60.25:2181
> > > 2014-04-14 17:04:22 INFO zookeeper.ClientCnxn - Socket connection
> > > established to tvip-m1-mw-zookeeper.dish.com/10.18.60.25:2181,
> > initiating
> > > session
> > > 2014-04-14 17:04:22 DEBUG zookeeper.ClientCnxn - Session
> > > establishment request sent on
> > > tvip-m1-mw-zookeeper.dish.com/10.18.60.25:2181
> > > 2014-04-14 17:04:22 INFO zookeeper.ClientCnxn - Session
> > > establishment complete on server
> > > tvip-m1-mw-zookeeper.dish.com/10.18.60.25:2181,
> > > sessionid = 0x2453d08efcac215, negotiated timeout = 30000
> > > 2014-04-14 17:04:22 DEBUG zkclient.ZkClient - Received event:
> > WatchedEvent
> > > state:SyncConnected type:None path:null
> > > 2014-04-14 17:04:22 INFO zkclient.ZkClient - zookeeper state changed
> > > (SyncConnected)
> > > 2014-04-14 17:04:22 DEBUG zkclient.ZkClient - Leaving process event
> > > 2014-04-14 17:04:22 DEBUG zkclient.ZkClient - State is SyncConnected
> > > 2014-04-14 17:04:22 DEBUG zookeeper.ClientCnxn - Reading reply
> > > sessionid:0x2453d08efcac215, packet:: clientPath:null
> > > serverPath:null finished:false header:: 1,8  replyHeader:: 1,73101951651,0
> request::
> > > '/brokers/topics,F  response::
> > >
> > v{'bill-generated,'bob_test_topic,'rain-burn-in,'account-adjustment-is
> > sued,'account-access,'customer-cues,'submit-agreement,'logtest_v3,'acc
> > ount-created,'rain-load-test,'outbound-call-attempted,'account-info-up
> > dated,'rain-enhanced-consumer,'threshold-exceeded,'account-activated,'
> > rain-enhanced,'outbound-communications}
> > > 2014-04-14 17:04:22 DEBUG zookeeper.ClientCnxn - Reading reply
> > > sessionid:0x2453d08efcac215, packet:: clientPath:null
> > > serverPath:null finished:false header:: 2,8  replyHeader:: 2,73101951651,0
> request::
> > > '/brokers/topics/bill-generated/partitions,F  response::
> > >
> > v{'35,'36,'33,'34,'39,'37,'38,'43,'42,'41,'40,'22,'23,'24,'25,'26,'27,
> > '28,'29,'3,'2,'1,'0,'30,'7,'6,'32,'5,'4,'31,'9,'8,'19,'17,'18,'15,'16,
> > '13,'14,'11,'12,'21,'20,'109,'108,'107,'106,'105,'104,'103,'99,'102,'1
> > 01,'100,'98,'97,'96,'95,'94,'93,'92,'91,'90,'10,'88,'89,'116,'79,'117,
> > '114,'78,'77,'115,'112,'113,'110,'111,'118,'119,'82,'83,'80,'81,'86,'8
> > 7,'84,'85,'67,'125,'126,'66,'69,'127,'68,'128,'121,'122,'123,'124,'129
> > ,'70,'71,'120,'72,'73,'74,'75,'76,'59,'58,'57,'56,'55,'64,'65,'62,'63,
> > '60,'61,'130,'49,'48,'45,'44,'47,'46,'51,'52,'53,'54,'50}
> > > 2014-04-14 17:04:22 DEBUG zookeeper.ClientCnxn - Reading reply
> > > sessionid:0x2453d08efcac215, packet:: clientPath:null
> > > serverPath:null finished:false header:: 3,8  replyHeader:: 3,73101951651,0
> request::
> > > '/brokers/topics/bob_test_topic/partitions,F  response::
> > > v{'3,'2,'1,'0,'6,'5,'4}
> > >
> > >
> > >
> > > 2014-04-14 17:05:11 DEBUG zookeeper.ClientCnxn - Reading reply
> > > sessionid:0x2453d08efcac215, packet:: clientPath:null
> > > serverPath:null finished:false header::
> > > 2122,1  replyHeader:: 2122,73101958104,0  request::
> > > '/admin/preferred_replica_election,#7b2276657273696f6e
> > >
> > > *big list of hex values*
> > >
> > > 6974696f6e223a35317d2c7b22746f706963223a2262696c6c2d67656e657261
> > >
> > >
> >
> 746564222c22706172746974696f6e223a34377d2c7b22746f706963223a226c6f67
> 74
> >
> 6573745f7633222c22706172746974696f6e223a377d2c7b22746f706963223a227
> 261
> > 696e2d656e68616e6
> > >
> >
> 365642d636f6e73756d6572222c22706172746974696f6e223a357d5d7d,v{s{31,s
> {'
> > world,'anyone}}},0
> > >  response:: '/admin/preferred_replica_election
> > >
> > > 2014-04-14 17:05:11 INFO
> > > admin.PreferredReplicaLeaderElectionCommand$ - Created preferred
> > > replica election path with {"version":1,"partitions":[{"topic":"acc
> > >
> > >
> > ount-info-updated","partition":36},{"topic":"outbound-
> communications","partition":39},{"topic":"account-
> access","partition":107},{"topic":"account-created","
> > > partition":8},{"topic":"rain-burn-in","partition":46}
> > >
> > > *huge list of topics and partitions*
> > >
> > > [submit-agreement,31])
> > > 2014-04-14 17:05:11 DEBUG zkclient.ZkClient - Closing ZkClient...
> > > 2014-04-14 17:05:11 INFO zkclient.ZkEventThread - Terminate ZkClient
> > event
> > > thread.
> > > 2014-04-14 17:05:11 DEBUG zkclient.ZkConnection - Closing ZooKeeper
> > > connected to tvip-m1-mw-zookeeper.dish.com:2181
> > > 2014-04-14 17:05:11 DEBUG zookeeper.ZooKeeper - Closing session:
> > > 0x2453d08efcac215
> > > 2014-04-14 17:05:11 DEBUG zookeeper.ClientCnxn - Closing client for
> > > session: 0x2453d08efcac215
> > > 2014-04-14 17:05:11 DEBUG zookeeper.ClientCnxn - Reading reply
> > > sessionid:0x2453d08efcac215, packet:: clientPath:null
> > > serverPath:null finished:false header:: 2123,-11  replyHeader::
> > > 2123,73101958105,0
> > >  request:: null response:: null
> > > 2014-04-14 17:05:11 DEBUG zookeeper.ClientCnxn - Disconnecting
> > > client for
> > > session: 0x2453d08efcac215
> > > 2014-04-14 17:05:11 INFO zookeeper.ZooKeeper - Session:
> > > 0x2453d08efcac215 closed
> > > 2014-04-14 17:05:11 DEBUG zkclient.ZkClient - Closing
> > > ZkClient...done
> > > 2014-04-14 17:05:11 DEBUG zookeeper.ClientCnxn - An exception was
> > > thrown while closing send thread for session 0x2453d08efcac215 :
> > > Unable to read additional data from server sessionid
> > > 0x2453d08efcac215, likely server
> > has
> > > closed socket
> > > 2014-04-14 17:05:11 INFO zookeeper.ClientCnxn - EventThread shut
> > > down
> > >
> > >
> > > If I perform a topic list, I will see the partitions balanced
> > > between the brokers. However if I check in zookeeper, I see the
> > > following admin node still existing:
> > >
> > > [zk: tm1-zookeeper101:2181(CONNECTED) 0] get
> > > /admin/preferred_replica_election
> > >
> > >
> > {"version":1,"partitions":[{"topic":"account-info-updated","partition"
> > :36},
> > >
> > > *large list of topic and partitions*
> > >
> > > ,{"topic":"rain-enhanced-consumer","partition":5}]}
> > > cZxid = 0x1105375bd8
> > > ctime = Mon Apr 14 17:05:11 MDT 2014 mZxid = 0x1105375bd8 mtime =
> > > Mon Apr 14 17:05:11 MDT 2014 pZxid = 0x1105375bd8 cversion = 0
> > > dataVersion = 0 aclVersion = 0 ephemeralOwner = 0x0 dataLength =
> > > 94578 numChildren = 0
> > >
> > > It appears that node was not removed after the election occurred. It
> > looks
> > > like this is taking over 30 seconds. Perhaps this process is taking
> > > over
> > 30
> > > seconds as defined in the zookeeper session timeout?
> > >
> > > If I re-run the election (with the admin node still in place), I get
> > > the following error:
> > >
> > >
> > > Performing Rebalancing Process...
> > > 2014-04-14 17:18:13 DEBUG zkclient.ZkConnection - Creating new
> > > ZookKeeper instance to connect to tvip-m1-mw-
> zookeeper.dish.com:2181.
> > > 2014-04-14 17:18:13 INFO zkclient.ZkEventThread - Starting ZkClient
> > > event thread.
> > > 2014-04-14 17:18:13 INFO zookeeper.ZooKeeper - Client
> > > environment:zookeeper.version=3.3.3-1203054, built on 11/17/2011
> > > 05:47
> > GMT
> > > 2014-04-14 17:18:13 INFO zookeeper.ZooKeeper - Client environment:
> > > host.name=tm1mwdpl05
> > > 2014-04-14 17:18:13 INFO zookeeper.ZooKeeper - Client
> > > environment:java.version=1.7.0_51
> > > 2014-04-14 17:18:13 INFO zookeeper.ZooKeeper - Client
> > > environment:java.vendor=Oracle Corporation
> > > 2014-04-14 17:18:13 INFO zookeeper.ZooKeeper - Client
> > > environment:java.home=/apps/des/jdk1.7.0_51/jre
> > > 2014-04-14 17:18:13 INFO zookeeper.ZooKeeper - Client
> > >
> > environment:java.class.path=:/apps/des/kafka/bin/../core/build/dependa
> > nt-libs-2.8.0/*.jar:/apps/des/kafka/bin/../perf/build/libs//kafka-perf
> > _2.8.0*.jar:/apps/des/kafka/bin/../clients/build/libs//kafka-clients*.
> > jar:/apps/des/kafka/bin/../examples/build/libs//kafka-examples*.jar:/a
> > pps/des/kafka/bin/../contrib/hadoop-consumer/build/libs//kafka-
> hadoop-
> > consumer*.jar:/apps/des/kafka/bin/../contrib/hadoop-producer/build/lib
> > s//kafka-hadoop-producer*.jar:/apps/des/kafka/bin/../libs/jopt-simple-
> > 3.2.jar:/apps/des/kafka/bin/../libs/kafka_2.10-0.8.1.jar:/apps/des/kaf
> > ka/bin/../libs/log4j-1.2.15.jar:/apps/des/kafka/bin/../libs/metrics-an
> > notation-2.2.0.jar:/apps/des/kafka/bin/../libs/metrics-core-2.2.0.jar:
> > /apps/des/kafka/bin/../libs/scala-library-2.10.1.jar:/apps/des/kafka/b
> > in/../libs/slf4j-api-1.7.2.jar:/apps/des/kafka/bin/../libs/snappy-java
> > -1.0.5.jar:/apps/des/kafka/bin/../libs/zkclient-0.3.jar:/apps/des/kafk
> > a/bin/../libs/zookeeper-3.3.4.jar:/apps/des/kafka/bin/../core/build/li
> > bs/kafka_2.8.0*.jar
> > > 2014-04-14 17:18:13 INFO zookeeper.ZooKeeper - Client
> > >
> > environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:
> > /lib64:/lib:/usr/lib
> > > 2014-04-14 17:18:13 INFO zookeeper.ZooKeeper - Client
> > > environment:java.io.tmpdir=/tmp
> > > 2014-04-14 17:18:13 INFO zookeeper.ZooKeeper - Client
> > > environment:java.compiler=<NA>
> > > 2014-04-14 17:18:13 INFO zookeeper.ZooKeeper - Client environment:
> > os.name
> > > =Linux
> > > 2014-04-14 17:18:13 INFO zookeeper.ZooKeeper - Client
> > > environment:os.arch=amd64
> > > 2014-04-14 17:18:13 INFO zookeeper.ZooKeeper - Client
> > > environment:os.version=2.6.32-220.el6.x86_64
> > > 2014-04-14 17:18:13 INFO zookeeper.ZooKeeper - Client environment:
> > > user.name=desadmin
> > > 2014-04-14 17:18:13 INFO zookeeper.ZooKeeper - Client
> > > environment:user.home=/apps/des
> > > 2014-04-14 17:18:13 INFO zookeeper.ZooKeeper - Client
> > > environment:user.dir=/apps/des/bin
> > > 2014-04-14 17:18:13 INFO zookeeper.ZooKeeper - Initiating client
> > > connection,
> > > connectString=tvip-m1-mw-
> zookeeper.dish.com:2181sessionTimeout=30000
> > watcher=org.I0Itec.zkclient.ZkClient@73250065
> > > 2014-04-14 17:18:13 DEBUG zookeeper.ClientCnxn -
> > > zookeeper.disableAutoWatchReset is false
> > > 2014-04-14 17:18:13 DEBUG zkclient.ZkClient - Awaiting connection to
> > > Zookeeper server
> > > 2014-04-14 17:18:13 DEBUG zkclient.ZkClient - Waiting for keeper
> > > state SyncConnected
> > > 2014-04-14 17:18:13 INFO zookeeper.ClientCnxn - Opening socket
> > > connection to server tvip-m1-mw-zookeeper.dish.com/10.18.60.25:2181
> > > 2014-04-14 17:18:13 INFO zookeeper.ClientCnxn - Socket connection
> > > established to tvip-m1-mw-zookeeper.dish.com/10.18.60.25:2181,
> > initiating
> > > session
> > > 2014-04-14 17:18:13 DEBUG zookeeper.ClientCnxn - Session
> > > establishment request sent on
> > > tvip-m1-mw-zookeeper.dish.com/10.18.60.25:2181
> > > 2014-04-14 17:18:13 INFO zookeeper.ClientCnxn - Session
> > > establishment complete on server
> > > tvip-m1-mw-zookeeper.dish.com/10.18.60.25:2181,
> > > sessionid = 0x1453d08efa2c2b7, negotiated timeout = 30000
> > > 2014-04-14 17:18:13 DEBUG zkclient.ZkClient - Received event:
> > WatchedEvent
> > > state:SyncConnected type:None path:null
> > > 2014-04-14 17:18:13 INFO zkclient.ZkClient - zookeeper state changed
> > > (SyncConnected)
> > > 2014-04-14 17:18:13 DEBUG zkclient.ZkClient - Leaving process event
> > > 2014-04-14 17:18:13 DEBUG zkclient.ZkClient - State is SyncConnected
> > > 2014-04-14 17:18:13 DEBUG zookeeper.ClientCnxn - Reading reply
> > > sessionid:0x1453d08efa2c2b7, packet:: clientPath:null
> > > serverPath:null finished:false header:: 1,8  replyHeader:: 1,73102061236,0
> request::
> > > '/brokers/topics,F  response::
> > >
> > v{'bill-generated,'bob_test_topic,'rain-burn-in,'account-adjustment-is
> > sued,'account-access,'customer-cues,'submit-agreement,'logtest_v3,'acc
> > ount-created,'rain-load-test,'outbound-call-attempted,'account-info-up
> > dated,'threshold-exceeded,'rain-enhanced-consumer,'account-activated,'
> > outbound-communications,'rain-enhanced}
> > > 2014-04-14 17:18:13 DEBUG zookeeper.ClientCnxn - Reading reply
> > > sessionid:0x1453d08efa2c2b7, packet:: clientPath:null
> > > serverPath:null finished:false header:: 2,8  replyHeader:: 2,73102061261,0
> request::
> > > '/brokers/topics/bill-generated/partitions,F  response::
> > >
> > v{'35,'36,'33,'34,'39,'37,'38,'43,'42,'41,'40,'22,'23,'24,'25,'26,'27,
> > '28,'29,'3,'2,'1,'0,'7,'30,'6,'5,'32,'31,'4,'9,'8,'19,'17,'18,'15,'16,
> > '13,'14,'11,'12,'21,'20,'109,'108,'107,'106,'105,'104,'103,'99,'102,'1
> > 01,'100,'98,'97,'96,'95,'94,'93,'92,'91,'90,'10,'88,'89,'116,'79,'117,
> > '114,'78,'77,'115,'112,'113,'110,'111,'118,'119,'82,'83,'80,'81,'86,'8
> > 7,'84,'85,'67,'125,'126,'66,'69,'127,'68,'128,'121,'122,'123,'124,'129
> > ,'70,'120,'71,'72,'73,'74,'75,'76,'59,'58,'57,'56,'55,'64,'65,'62,'63,
> > '60,'130,'61,'49,'48,'45,'44,'47,'46,'51,'52,'53,'54,'50}
> > >
> > > *large list of logging statements*
> > >
> > >
> > > Failed to start preferred replica election
> > > kafka.common.AdminCommandFailedException: Admin command failed
> > >         at
> > >
> >
> kafka.admin.PreferredReplicaLeaderElectionCommand.moveLeaderToPrefer
> re
> > dReplica(PreferredReplicaLeaderElectionCommand.scala:115)
> > >         at
> > >
> >
> kafka.admin.PreferredReplicaLeaderElectionCommand$.main(PreferredRepl
> i
> > caLeaderElectionCommand.scala:60)
> > >         at
> > >
> >
> kafka.admin.PreferredReplicaLeaderElectionCommand.main(PreferredReplic
> > aLeaderElectionCommand.scala)
> > > Caused by: kafka.admin.AdminOperationException: Preferred replica
> > > leader election currently in progress for
> > > Set([account-adjustment-issued,77],
> > >
> > > *large list of topics and partitions*
> > >
> > >
> > > [submit-agreement,31]). Aborting operation
> > >         at
> > >
> >
> kafka.admin.PreferredReplicaLeaderElectionCommand$.writePreferredRepli
> > caElectionData(PreferredReplicaLeaderElectionCommand.scala:101)
> > >         at
> > >
> >
> kafka.admin.PreferredReplicaLeaderElectionCommand.moveLeaderToPrefer
> re
> > dReplica(PreferredReplicaLeaderElectionCommand.scala:113)
> > >         ... 2 more
> > >
> > > 2014-04-14 17:19:00 DEBUG zkclient.ZkClient - Closing ZkClient...
> > > 2014-04-14 17:19:00 INFO zkclient.ZkEventThread - Terminate ZkClient
> > event
> > > thread.
> > > 2014-04-14 17:19:00 DEBUG zkclient.ZkConnection - Closing ZooKeeper
> > > connected to tvip-m1-mw-zookeeper.dish.com:2181
> > > 2014-04-14 17:19:00 DEBUG zookeeper.ZooKeeper - Closing session:
> > > 0x1453d08efa2c2b7
> > > 2014-04-14 17:19:00 DEBUG zookeeper.ClientCnxn - Closing client for
> > > session: 0x1453d08efa2c2b7
> > > 2014-04-14 17:19:00 DEBUG zookeeper.ClientCnxn - Reading reply
> > > sessionid:0x1453d08efa2c2b7, packet:: clientPath:null
> > > serverPath:null finished:false header:: 2124,-11  replyHeader::
> > > 2124,73102067456,0
> > >  request:: null response:: null
> > > 2014-04-14 17:19:00 DEBUG zookeeper.ClientCnxn - Disconnecting
> > > client for
> > > session: 0x1453d08efa2c2b7
> > > 2014-04-14 17:19:00 INFO zookeeper.ZooKeeper - Session:
> > > 0x1453d08efa2c2b7 closed
> > > 2014-04-14 17:19:00 DEBUG zookeeper.ClientCnxn - An exception was
> > > thrown while closing send thread for session 0x1453d08efa2c2b7 :
> > > Unable to read additional data from server sessionid
> > > 0x1453d08efa2c2b7, likely server
> > has
> > > closed socket
> > > 2014-04-14 17:19:00 INFO zookeeper.ClientCnxn - EventThread shut
> > > down
> > > 2014-04-14 17:19:00 DEBUG zkclient.ZkClient - Closing
> > > ZkClient...done
> > >
> > >
> > > If I manually remove the /admin/preferred_replica_election node, the
> > > java exception goes away.
> > >
> > > A bit of other information:
> > >
> > > At worst ParNew GC time: 25-36 ms (total time per 15 second
> > > interval, 1-2  GCs per 15 second interval) per Broker during this
> > > operation. No CMS
> > GCing
> > > occurring.
> > >
> > >
> > > Any ideas?
> > >
> > > Thanks
> > >
> > > /bob
> > >
> >

Reply via email to