Re: All brokers are running but some partitions' leader is -1

Qi Xu Wed, 25 Nov 2015 16:25:38 -0800

Great to know that. Thanks Gwen!

On Wed, Nov 25, 2015 at 12:03 PM, Gwen Shapira <g...@confluent.io> wrote:


> 1. Yes, you can do a rolling upgrade of brokers from 0.8.2 to 0.9.0. The
> important thing is to upgrade the brokers before you upgrade any of the
> clients.
>
> 2. I'm not aware of issues with 0.9.0 and SparkStreaming. However,
> definitely do your own testing to make sure.
>
> On Wed, Nov 25, 2015 at 11:25 AM, Qi Xu <shkir...@gmail.com> wrote:
>
> > Hi Gwen,
> > Yes, we're going to upgrade the 0.9.0 version. Regarding the upgrade, we
> > definitely don't want to have down time of our cluster.
> > So the upgrade will be machine by machine. Will the release 0.9.0 work
> with
> > the Aug's version together in the same Kafka cluster?
> > Also we currently run spark streaming job (with scala 2.10) against the
> > cluster. Any known issues of 0.9.0 are you aware of under this scenario?
> >
> > Thanks,
> > Tony
> >
> >
> > On Mon, Nov 23, 2015 at 5:41 PM, Gwen Shapira <g...@confluent.io> wrote:
> >
> > > We fixed many many bugs since August. Since we are about to release
> 0.9.0
> > > (with SSL!), maybe wait a day and go with a released and tested
> version.
> > >
> > > On Mon, Nov 23, 2015 at 3:01 PM, Qi Xu <shkir...@gmail.com> wrote:
> > >
> > > > Forgot to mention is that the Kafka version we're using is from Aug's
> > > > Trunk branch---which has the SSL support.
> > > >
> > > > Thanks again,
> > > > Qi
> > > >
> > > >
> > > > On Mon, Nov 23, 2015 at 2:29 PM, Qi Xu <shkir...@gmail.com> wrote:
> > > >
> > > >> Loop another guy from our team.
> > > >>
> > > >> On Mon, Nov 23, 2015 at 2:26 PM, Qi Xu <shkir...@gmail.com> wrote:
> > > >>
> > > >>> Hi folks,
> > > >>> We have a 10 node cluster and have several topics. Each topic has
> > about
> > > >>> 256 partitions with 3 replica factor. Now we run into an issue that
> > in
> > > some
> > > >>> topic, a few partition (< 10)'s leader is -1 and all of them has
> only
> > > one
> > > >>> synced partition.
> > > >>>
> > > >>> From the Kafka manager, here's the snapshot:
> > > >>> [image: Inline image 2]
> > > >>>
> > > >>> [image: Inline image 1]
> > > >>>
> > > >>> here's the state log:
> > > >>> [2015-11-23 21:57:58,598] ERROR Controller 1 epoch 435499 initiated
> > > >>> state change for partition [userlogs,84] from OnlinePartition to
> > > >>> OnlinePartition failed (state.change.logger)
> > > >>> kafka.common.StateChangeFailedException: encountered error while
> > > >>> electing leader for partition [userlogs,84] due to: Preferred
> replica
> > > 0 for
> > > >>> partition [userlogs,84] is either not alive or not in the isr.
> > Current
> > > >>> leader and ISR: [{"leader":-1,"leader_epoch":203,"isr":[1]}].
> > > >>> Caused by: kafka.common.StateChangeFailedException: Preferred
> > replica 0
> > > >>> for partition [userlogs,84] is either not alive or not in the isr.
> > > Current
> > > >>> leader and ISR: [{"leader":-1,"leader_epoch":203,"isr":[1]}]
> > > >>>
> > > >>> My question is:
> > > >>> 1) how could this happen and how can I fix it or work around it?
> > > >>> 2) Is 256 partitions too big? We have about 200+ cores for spark
> > > >>> streaming job.
> > > >>>
> > > >>> Thanks,
> > > >>> Qi
> > > >>>
> > > >>>
> > > >>
> > > >
> > >
> >
>

Re: All brokers are running but some partitions' leader is -1

Reply via email to