Are you hard killing the brokers? And is this issue reproducible?

On Sat, Dec 21, 2013 at 11:39 AM, Drew Goya <d...@gradientx.com> wrote:

> Hey guys, another small issue to report for 0.8.1.  After a couple days 3
> of my brokers had fallen off the ISR list for a 2-3 of their partitions.
>
> I didn't see anything unusual in the log and I just restarted one.  It came
> up fine but as it loaded its logs I these messages showed up:
>
> [2013-12-21 19:25:19,968] WARN [ReplicaFetcherThread-0-2], Replica 1 for
> partition [Events2,58] reset its fetch offset to current leader 2's start
> offset 1042738519 (kafka.server.ReplicaFetcherThread)
> [2013-12-21 19:25:19,969] WARN [ReplicaFetcherThread-0-14], Replica 1 for
> partition [Events2,28] reset its fetch offset to current leader 14's start
> offset 1043415514 (kafka.server.ReplicaFetcherThread)
> [2013-12-21 19:25:20,012] WARN [ReplicaFetcherThread-0-2], Current offset
> 1011209589 for partition [Events2,58] out of range; reset offset to
> 1042738519 (kafka.server.ReplicaFetcherThread)
> [2013-12-21 19:25:20,013] WARN [ReplicaFetcherThread-0-14], Current offset
> 1010086751 for partition [Events2,28] out of range; reset offset to
> 1043415514 (kafka.server.ReplicaFetcherThread)
> [2013-12-21 19:25:20,036] WARN [ReplicaFetcherThread-0-14], Replica 1 for
> partition [Events2,71] reset its fetch offset to current leader 14's start
> offset 1026871415 (kafka.server.ReplicaFetcherThread)
> [2013-12-21 19:25:20,036] WARN [ReplicaFetcherThread-0-2], Replica 1 for
> partition [Events2,44] reset its fetch offset to current leader 2's start
> offset 1052372907 (kafka.server.ReplicaFetcherThread)
> [2013-12-21 19:25:20,036] WARN [ReplicaFetcherThread-0-14], Current offset
> 993879706 for partition [Events2,71] out of range; reset offset to
> 1026871415 (kafka.server.ReplicaFetcherThread)
> [2013-12-21 19:25:20,036] WARN [ReplicaFetcherThread-0-2], Current offset
> 1020715056 for partition [Events2,44] out of range; reset offset to
> 1052372907 (kafka.server.ReplicaFetcherThread)
>
> Judging by the network traffic and disk usage changes after the reboot
> (both jumped up) a couple of the partition replicas had fallen behind and
> are now catching up.
>
>
> On Thu, Dec 19, 2013 at 4:37 PM, Neha Narkhede <neha.narkh...@gmail.com
> >wrote:
>
> > Hi Drew,
> >
> > That problem will be fixed by
> > https://issues.apache.org/jira/browse/KAFKA-1074. I think we are close
> to
> > checking that in to trunk.
> >
> > Thanks,
> > Neha
> >
> >
> > On Wed, Dec 18, 2013 at 9:02 AM, Drew Goya <d...@gradientx.com> wrote:
> >
> > > Thanks Neha, I rolled upgrades and completed a rebalance!
> > >
> > > I ran into a few small issues I figured I would share.
> > >
> > > On a few Brokers, there were some log directories left over from some
> > > failed rebalances which prevented the 0.8.1 brokers from starting once
> I
> > > completed the upgrade.  These directories contained an index file and a
> > > zero size log file, once I cleaned those out the brokers were able to
> > start
> > > up fine.  If anyone else runs into the same problem, and is running
> RHEL,
> > > this is the bash script I used to clean them out:
> > >
> > > du --max-depth=1 -h /data/kafka/logs | grep K | sed s/.*K.// | sudo rm
> -r
> > >
> > >
> > > On Tue, Dec 17, 2013 at 10:42 AM, Neha Narkhede <
> neha.narkh...@gmail.com
> > > >wrote:
> > >
> > > > There are no compatibility issues. You can roll upgrades through the
> > > > cluster one node at a time.
> > > >
> > > > Thanks
> > > > Neha
> > > >
> > > >
> > > > On Tue, Dec 17, 2013 at 9:15 AM, Drew Goya <d...@gradientx.com>
> wrote:
> > > >
> > > > > So I'm going to be going through the process of upgrading a cluster
> > > from
> > > > > 0.8.0 to the trunk (0.8.1).
> > > > >
> > > > > I'm going to be expanding this cluster several times and the
> problems
> > > > with
> > > > > reassigning partitions in 0.8.0 mean I have to move to trunk(0.8.1)
> > > asap.
> > > > >
> > > > > Will it be safe to roll upgrades through the cluster one by one?
> > > > >
> > > > > Also are there any client compatibility issues I need to worry
> about?
> > > >  Am I
> > > > > going to need to pause/upgrade all my consumers/producers at once
> or
> > > can
> > > > I
> > > > > roll upgrades through the cluster and then upgrade my clients one
> by
> > > one?
> > > > >
> > > > > Thanks in advance!
> > > > >
> > > >
> > >
> >
>

Reply via email to