Neil, what you are seeing could probably be KAFKA-1407 <https://issues.apache.org/jira/browse/KAFKA-1407>.
On Tue, Oct 21, 2014 at 12:03 PM, Gwen Shapira <gshap...@cloudera.com> wrote: > Consumers always read from the leader replica, which is always in sync > by definition. So you are good there. > The concern would be if the leader crashes during this period. > > > > On Tue, Oct 21, 2014 at 2:56 PM, Neil Harkins <nhark...@gmail.com> wrote: > > Hi. I've got a 5 node cluster running Kafka 0.8.1, > > with 4697 partitions (2 replicas each) across 564 topics. > > I'm sending it about 1% of our total messaging load now, > > and several times a day there is a period where 1~1500 > > partitions have one replica not in sync. Is this normal? > > If a consumer is reading from a replica that gets deemed > > "not in sync", does it get redirected to the good replica? > > Is there a #partitions over which maintenance tasks > > become infeasible? > > > > Relevant config bits: > > auto.leader.rebalance.enable=true > > leader.imbalance.per.broker.percentage=20 > > leader.imbalance.check.interval.seconds=30 > > replica.lag.time.max.ms=10000 > > replica.lag.max.messages=4000 > > num.replica.fetchers=4 > > replica.fetch.max.bytes=10485760 > > > > Not necessarily correlated to those periods, > > I see a lot of these errors in the logs: > > > > [2014-10-20 21:23:26,999] 21963614 [ReplicaFetcherThread-3-1] ERROR > > kafka.server.ReplicaFetcherThread - [ReplicaFetcherThread-3-1], Error > > in fetch Name: FetchRequest; Version: 0; CorrelationId: 77423; > > ClientId: ReplicaFetcherThread-3-1; ReplicaId: 2; MaxWait: 500 ms; > > MinBytes: 1 bytes; RequestInfo: ... > > > > And a few of these: > > > > [2014-10-20 21:23:39,555] 3467527 [kafka-scheduler-2] ERROR > > kafka.utils.ZkUtils$ - Conditional update of path > > /brokers/topics/foo.bar/partitions/3/state with data > > > {"controller_epoch":11,"leader":3,"version":1,"leader_epoch":109,"isr":[3]} > > and expected version 197 failed due to > > org.apache.zookeeper.KeeperException$BadVersionException: > > KeeperErrorCode = BadVersion for > > /brokers/topics/foo.bar/partitions/3/state > > > > And this one I assume is a client closing the connection non-gracefully, > > thus should probably be a warning, not an error?: > > > > [2014-10-20 21:54:15,599] 23812214 [kafka-processor-9092-3] ERROR > > kafka.network.Processor - Closing socket for /10.31.0.224 because of > > error > > > > -neil > -- -- Guozhang