On Mon, Apr 23, 2018 at 10:29 AM, Manikumar <manikumar.re...@gmail.com> wrote: > > What is the replication factor? Was unclean election enabled (It enabled by > default in 0.10.0.1)? >
RF is 2 for regular topics (global var). Re: unclean elections, whatever is the default was on, so I think unclean election was enabled for 0.10.0.1, and then disabled after upgrade as rolled into new config file. Setting it dynamically to affected topics fixed the issue of no leaders/writes for these affected partitions. In hindsight should've kept it enabled, but yeah :) We haven't really tweaked the server config that much, mostly defaults outside of retention hours and rf. Trusting the upstream to have sane defaults and all. > With sufficient replication factor and healthy ISR, we may not see > this issue. As mentioned it affected 3 brokers out of 10 and only 9 partitions in them. I was grepping the topics --describe output a bit, and we definitely have more replica/isr mapping combinations in these particular brokers than 9. So randomly 9 out of some tens broke with this pairing. After the upgrade settled all brokers were happily logging about rolling new partitions etc normal output. During upgrade I saw those ERRORs related to no leader, and a few others. chrs, Mika