Re: Leadership rebalance causing drop of incoming messages

2015-01-21 Thread Allen Wang
> > It is unclear to me why restarting clients would fix the compatibility > issue. Or do you mean you bounced in the right version of the snappy > jar? > Also, what version of the broker are you on? We restarted the client with the compression turned off and that fixed the problem. We are using

Re: Leadership rebalance causing drop of incoming messages

2015-01-21 Thread Joel Koshy
> instability of the broker cluster might have been caused by a snappy > un-compression error. In our case, the consumer and producer happens to be > the same application so restarting the client made the recovery of the > ... > The un-compression error is likely to be caused by incompatible snappy

Re: Leadership rebalance causing drop of incoming messages

2015-01-21 Thread Allen Wang
After a closer look to other metrics and broker logs, we found that the instability of the broker cluster might have been caused by a snappy un-compression error. In our case, the consumer and producer happens to be the same application so restarting the client made the recovery of the broker clust

Re: Leadership rebalance causing drop of incoming messages

2015-01-15 Thread Joel Koshy
Not sure what could be going on.. What version of the client and the broker are you on? Can you verify from the state change logs the time it took for leadership to move to the preferred leader? Were there long GCs in your brokers? Can you also look for zookeeper session expirations in your bro

Re: Leadership rebalance causing drop of incoming messages

2015-01-15 Thread Allen Wang
Another kind of error messages is found in the kafka state change log after leadership rebalance: 2015-01-15 00:01:39,895 WARN kafka.utils.Logging$class:83 [kafka-request-handler-0] [warn] Broker 8 received invalid LeaderAndIsr request with correlation id 221 from controller 0 epoch 19 with an ol

Re: Leadership rebalance causing drop of incoming messages

2015-01-15 Thread Allen Wang
We are using the scala producer. From producer side, we have seen a lot of error messages in producer during the time of incoming message drop: Produce request with correlation id 31616255 failed due to [trace_annotation,10]: kafka.common.NotLeaderForPartitionException And a few (far less than th

Re: Leadership rebalance causing drop of incoming messages

2015-01-15 Thread Joel Koshy
> Is leadership rebalance a safe operation? Yes - we use it routinely. For any partition, there should only be a brief (order of seconds) period of rejected messages as leaders move. When that happens the client should refresh metadata and discover the new leader. Are you using the Java producer?

Leadership rebalance causing drop of incoming messages

2015-01-14 Thread Allen Wang
Hello, We did a manual leadership rebalance (using PreferredReplicaLeaderElectionCommand) under heavy load and found that there is a significant drop of incoming messages to the broker cluster for more than an hour. Looking at broker log, we found a lot of errors like this: 2015-01-15 00:00:03,33