This is probably related to KAFKA-1154. Could you upgrade to the latest trunk?
Thanks, Jun On Mon, Dec 23, 2013 at 3:21 PM, Drew Goya <d...@gradientx.com> wrote: > Hey All, another thing to report for my 0.8.1 migration. I am seeing these > errors occasionally right after a I run a leader election. This looks to > be related to KAFKA-860 as it is the same exception. I see this issue was > closed a while go though and I should be running a commit with the fix in. > I'm on trunk/87efda. > > I also see there is a more recent issue with replica threads dying out > while becoming followers (KAFKA-1178) but I'm not seeing that exception. > I'm going to roll updates through the cluster and bring my brokers up to > trunk/b23cf1 and see how that goes. > > [2013-12-23 22:54:38,389] ERROR [ReplicaFetcherThread-0-11], Error due to > (kafka.server.ReplicaFetcherThread) > kafka.common.KafkaException: error processing data for partition > [Events2,113] offset 1077499310 > at > > kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1$$anonfun$apply$mcV$sp$2.apply(AbstractFetcherThread.scala:139) > at > > kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1$$anonfun$apply$mcV$sp$2.apply(AbstractFetcherThread.scala:111) > at scala.collection.immutable.Map$Map1.foreach(Map.scala:105) > at > > kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1.apply$mcV$sp(AbstractFetcherThread.scala:111) > at > > kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1.apply(AbstractFetcherThread.scala:111) > at > > kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1.apply(AbstractFetcherThread.scala:111) > at kafka.utils.Utils$.inLock(Utils.scala:538) > at > > kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:110) > at > kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:88) > at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:51) > Caused by: java.lang.RuntimeException: Offset mismatch: fetched offset = > 1077499310, log end offset = 1077499313. > at > > kafka.server.ReplicaFetcherThread.processPartitionData(ReplicaFetcherThread.scala:49) > at > > kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1$$anonfun$apply$mcV$sp$2.apply(AbstractFetcherThread.scala:130) > ... 9 more > > > On Mon, Dec 23, 2013 at 2:50 PM, Drew Goya <d...@gradientx.com> wrote: > > > We are running on an Amazon Linux AMI, this is our specific version: > > > > Linux version 2.6.32-220.23.1.el6.centos.plus.x86_64 ( > > mockbu...@c6b5.bsys.dev.centos.org) (gcc version 4.4.6 20110731 (Red Hat > > 4.4.6-3) (GCC) ) #1 SMP Tue Jun 19 04:14:37 BST 2012 > > > > > > On Mon, Dec 23, 2013 at 11:24 AM, Guozhang Wang <wangg...@gmail.com > >wrote: > > > >> Hi Drew, > >> > >> I tried the kafka-server-stop script and it worked for me. Wondering > which > >> OS are you using? > >> > >> Guozhang > >> > >> > >> On Mon, Dec 23, 2013 at 10:57 AM, Drew Goya <d...@gradientx.com> wrote: > >> > >> > Occasionally I do have to hard kill brokers, the kafka-server-stop.sh > >> > script stopped working for me a few months ago. I saw another thread > in > >> > the mailing list mentioning the issue too. I'll change the signal > back > >> to > >> > SIGTERM and run that way for a while, hopefully the problem goes away. > >> > > >> > This is the commit where it changed: > >> > > >> > > >> > > >> > https://github.com/apache/kafka/commit/51de7c55d2b3107b79953f401fc8c9530bd0eea0 > >> > > >> > > >> > On Mon, Dec 23, 2013 at 10:09 AM, Neha Narkhede < > >> neha.narkh...@gmail.com > >> > >wrote: > >> > > >> > > Are you hard killing the brokers? And is this issue reproducible? > >> > > > >> > > > >> > > On Sat, Dec 21, 2013 at 11:39 AM, Drew Goya <d...@gradientx.com> > >> wrote: > >> > > > >> > > > Hey guys, another small issue to report for 0.8.1. After a couple > >> > days 3 > >> > > > of my brokers had fallen off the ISR list for a 2-3 of their > >> > partitions. > >> > > > > >> > > > I didn't see anything unusual in the log and I just restarted one. > >> It > >> > > came > >> > > > up fine but as it loaded its logs I these messages showed up: > >> > > > > >> > > > [2013-12-21 19:25:19,968] WARN [ReplicaFetcherThread-0-2], > Replica 1 > >> > for > >> > > > partition [Events2,58] reset its fetch offset to current leader > 2's > >> > start > >> > > > offset 1042738519 (kafka.server.ReplicaFetcherThread) > >> > > > [2013-12-21 19:25:19,969] WARN [ReplicaFetcherThread-0-14], > Replica > >> 1 > >> > for > >> > > > partition [Events2,28] reset its fetch offset to current leader > 14's > >> > > start > >> > > > offset 1043415514 (kafka.server.ReplicaFetcherThread) > >> > > > [2013-12-21 19:25:20,012] WARN [ReplicaFetcherThread-0-2], Current > >> > offset > >> > > > 1011209589 for partition [Events2,58] out of range; reset offset > to > >> > > > 1042738519 (kafka.server.ReplicaFetcherThread) > >> > > > [2013-12-21 19:25:20,013] WARN [ReplicaFetcherThread-0-14], > Current > >> > > offset > >> > > > 1010086751 for partition [Events2,28] out of range; reset offset > to > >> > > > 1043415514 (kafka.server.ReplicaFetcherThread) > >> > > > [2013-12-21 19:25:20,036] WARN [ReplicaFetcherThread-0-14], > Replica > >> 1 > >> > for > >> > > > partition [Events2,71] reset its fetch offset to current leader > 14's > >> > > start > >> > > > offset 1026871415 (kafka.server.ReplicaFetcherThread) > >> > > > [2013-12-21 19:25:20,036] WARN [ReplicaFetcherThread-0-2], > Replica 1 > >> > for > >> > > > partition [Events2,44] reset its fetch offset to current leader > 2's > >> > start > >> > > > offset 1052372907 (kafka.server.ReplicaFetcherThread) > >> > > > [2013-12-21 19:25:20,036] WARN [ReplicaFetcherThread-0-14], > Current > >> > > offset > >> > > > 993879706 for partition [Events2,71] out of range; reset offset to > >> > > > 1026871415 (kafka.server.ReplicaFetcherThread) > >> > > > [2013-12-21 19:25:20,036] WARN [ReplicaFetcherThread-0-2], Current > >> > offset > >> > > > 1020715056 for partition [Events2,44] out of range; reset offset > to > >> > > > 1052372907 (kafka.server.ReplicaFetcherThread) > >> > > > > >> > > > Judging by the network traffic and disk usage changes after the > >> reboot > >> > > > (both jumped up) a couple of the partition replicas had fallen > >> behind > >> > and > >> > > > are now catching up. > >> > > > > >> > > > > >> > > > On Thu, Dec 19, 2013 at 4:37 PM, Neha Narkhede < > >> > neha.narkh...@gmail.com > >> > > > >wrote: > >> > > > > >> > > > > Hi Drew, > >> > > > > > >> > > > > That problem will be fixed by > >> > > > > https://issues.apache.org/jira/browse/KAFKA-1074. I think we > are > >> > close > >> > > > to > >> > > > > checking that in to trunk. > >> > > > > > >> > > > > Thanks, > >> > > > > Neha > >> > > > > > >> > > > > > >> > > > > On Wed, Dec 18, 2013 at 9:02 AM, Drew Goya <d...@gradientx.com> > >> > wrote: > >> > > > > > >> > > > > > Thanks Neha, I rolled upgrades and completed a rebalance! > >> > > > > > > >> > > > > > I ran into a few small issues I figured I would share. > >> > > > > > > >> > > > > > On a few Brokers, there were some log directories left over > from > >> > some > >> > > > > > failed rebalances which prevented the 0.8.1 brokers from > >> starting > >> > > once > >> > > > I > >> > > > > > completed the upgrade. These directories contained an index > >> file > >> > > and a > >> > > > > > zero size log file, once I cleaned those out the brokers were > >> able > >> > to > >> > > > > start > >> > > > > > up fine. If anyone else runs into the same problem, and is > >> running > >> > > > RHEL, > >> > > > > > this is the bash script I used to clean them out: > >> > > > > > > >> > > > > > du --max-depth=1 -h /data/kafka/logs | grep K | sed s/.*K.// | > >> sudo > >> > > rm > >> > > > -r > >> > > > > > > >> > > > > > > >> > > > > > On Tue, Dec 17, 2013 at 10:42 AM, Neha Narkhede < > >> > > > neha.narkh...@gmail.com > >> > > > > > >wrote: > >> > > > > > > >> > > > > > > There are no compatibility issues. You can roll upgrades > >> through > >> > > the > >> > > > > > > cluster one node at a time. > >> > > > > > > > >> > > > > > > Thanks > >> > > > > > > Neha > >> > > > > > > > >> > > > > > > > >> > > > > > > On Tue, Dec 17, 2013 at 9:15 AM, Drew Goya < > >> d...@gradientx.com> > >> > > > wrote: > >> > > > > > > > >> > > > > > > > So I'm going to be going through the process of upgrading > a > >> > > cluster > >> > > > > > from > >> > > > > > > > 0.8.0 to the trunk (0.8.1). > >> > > > > > > > > >> > > > > > > > I'm going to be expanding this cluster several times and > the > >> > > > problems > >> > > > > > > with > >> > > > > > > > reassigning partitions in 0.8.0 mean I have to move to > >> > > trunk(0.8.1) > >> > > > > > asap. > >> > > > > > > > > >> > > > > > > > Will it be safe to roll upgrades through the cluster one > by > >> > one? > >> > > > > > > > > >> > > > > > > > Also are there any client compatibility issues I need to > >> worry > >> > > > about? > >> > > > > > > Am I > >> > > > > > > > going to need to pause/upgrade all my consumers/producers > at > >> > once > >> > > > or > >> > > > > > can > >> > > > > > > I > >> > > > > > > > roll upgrades through the cluster and then upgrade my > >> clients > >> > one > >> > > > by > >> > > > > > one? > >> > > > > > > > > >> > > > > > > > Thanks in advance! > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> > >> > >> -- > >> -- Guozhang > >> > > > > >