Nice! I will observe that in a previous life, even within a "controlled datacenter environment" (albeit a large one), we wound up resetting the ZK timeout to 30s due to spurious network issues. Bumping replica max lag is important as well, since we also frequently saw the ISR=1 problem as machines were crashing (dying leader throwing all followers out of the ISR as its request threads backlogged).
-Steve On Tue, Oct 15, 2019 at 4:41 PM Jason Gustafson <ja...@confluent.io> wrote: > Hi All, > > I have a short KIP to raise the default zk session timeout: > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-537%3A+Increase+default+zookeeper+session+timeout > . > Please take a look. > > Thanks, > Jason >