Great to know that. Thanks Gwen! On Wed, Nov 25, 2015 at 12:03 PM, Gwen Shapira <g...@confluent.io> wrote:
> 1. Yes, you can do a rolling upgrade of brokers from 0.8.2 to 0.9.0. The > important thing is to upgrade the brokers before you upgrade any of the > clients. > > 2. I'm not aware of issues with 0.9.0 and SparkStreaming. However, > definitely do your own testing to make sure. > > On Wed, Nov 25, 2015 at 11:25 AM, Qi Xu <shkir...@gmail.com> wrote: > > > Hi Gwen, > > Yes, we're going to upgrade the 0.9.0 version. Regarding the upgrade, we > > definitely don't want to have down time of our cluster. > > So the upgrade will be machine by machine. Will the release 0.9.0 work > with > > the Aug's version together in the same Kafka cluster? > > Also we currently run spark streaming job (with scala 2.10) against the > > cluster. Any known issues of 0.9.0 are you aware of under this scenario? > > > > Thanks, > > Tony > > > > > > On Mon, Nov 23, 2015 at 5:41 PM, Gwen Shapira <g...@confluent.io> wrote: > > > > > We fixed many many bugs since August. Since we are about to release > 0.9.0 > > > (with SSL!), maybe wait a day and go with a released and tested > version. > > > > > > On Mon, Nov 23, 2015 at 3:01 PM, Qi Xu <shkir...@gmail.com> wrote: > > > > > > > Forgot to mention is that the Kafka version we're using is from Aug's > > > > Trunk branch---which has the SSL support. > > > > > > > > Thanks again, > > > > Qi > > > > > > > > > > > > On Mon, Nov 23, 2015 at 2:29 PM, Qi Xu <shkir...@gmail.com> wrote: > > > > > > > >> Loop another guy from our team. > > > >> > > > >> On Mon, Nov 23, 2015 at 2:26 PM, Qi Xu <shkir...@gmail.com> wrote: > > > >> > > > >>> Hi folks, > > > >>> We have a 10 node cluster and have several topics. Each topic has > > about > > > >>> 256 partitions with 3 replica factor. Now we run into an issue that > > in > > > some > > > >>> topic, a few partition (< 10)'s leader is -1 and all of them has > only > > > one > > > >>> synced partition. > > > >>> > > > >>> From the Kafka manager, here's the snapshot: > > > >>> [image: Inline image 2] > > > >>> > > > >>> [image: Inline image 1] > > > >>> > > > >>> here's the state log: > > > >>> [2015-11-23 21:57:58,598] ERROR Controller 1 epoch 435499 initiated > > > >>> state change for partition [userlogs,84] from OnlinePartition to > > > >>> OnlinePartition failed (state.change.logger) > > > >>> kafka.common.StateChangeFailedException: encountered error while > > > >>> electing leader for partition [userlogs,84] due to: Preferred > replica > > > 0 for > > > >>> partition [userlogs,84] is either not alive or not in the isr. > > Current > > > >>> leader and ISR: [{"leader":-1,"leader_epoch":203,"isr":[1]}]. > > > >>> Caused by: kafka.common.StateChangeFailedException: Preferred > > replica 0 > > > >>> for partition [userlogs,84] is either not alive or not in the isr. > > > Current > > > >>> leader and ISR: [{"leader":-1,"leader_epoch":203,"isr":[1]}] > > > >>> > > > >>> My question is: > > > >>> 1) how could this happen and how can I fix it or work around it? > > > >>> 2) Is 256 partitions too big? We have about 200+ cores for spark > > > >>> streaming job. > > > >>> > > > >>> Thanks, > > > >>> Qi > > > >>> > > > >>> > > > >> > > > > > > > > > >