On Mon, Apr 23, 2018 at 10:29 AM, Manikumar <manikumar.re...@gmail.com>
wrote:
>
> What is the replication factor? Was unclean election enabled (It enabled by
> default in 0.10.0.1)?
>

RF is 2 for regular topics (global var).

Re: unclean elections, whatever is the default was on, so I think unclean
election was enabled for 0.10.0.1, and then disabled after upgrade as
rolled into new config file. Setting it dynamically to affected topics
fixed the issue of no leaders/writes for these affected partitions. In
hindsight should've kept it enabled, but yeah :)

We haven't really tweaked the server config that much, mostly defaults
outside of retention hours and rf. Trusting the upstream to have sane
defaults and all.


> With sufficient replication factor and healthy ISR, we may not see
> this issue.


As mentioned it affected 3 brokers out of 10 and only 9 partitions in them.
I was grepping the topics --describe output a bit, and we definitely have
more replica/isr mapping combinations in these particular brokers than 9.
So randomly 9 out of some tens broke with this pairing. After the upgrade
settled all brokers were happily logging about rolling new partitions etc
normal output. During upgrade I saw those ERRORs related to no leader, and
a few others.

chrs,
Mika

Reply via email to