Hi Kevin, Thanks for the great write-up and the examples in the KIP that help with better understanding the motivation.
I also think that having such a category would help with Kafka operations by providing a more actionable indicator. One minor concern that I have is even with this new category and depending on the situation some Kafka SREs may still need to define their custom alerting. For example, for some, atMinIsr may be too late and they might want to be notified when a partition is atMinIsr + 1. But having this new category should be beneficial with Kafka monitoring in most cases without having to define customized alerts. Thanks, --Vahid On Tue, Feb 12, 2019, 09:02 Kevin Lu <lu.ke...@berkeley.edu> wrote: > Hi All, > > Getting the discussion thread started for KIP-427 in case anyone is free > right now. > > I’d like to propose a new category of topic partitions *AtMinIsr* which are > partitions that only have the minimum number of in sync replicas left in > the ISR set (as configured by min.insync.replicas). > > This would add two new metrics *ReplicaManager.AtMinIsrPartitionCount *& > *Partition.AtMinIsr*, and a new TopicCommand option* > --at-min-isr-partitions* to help in monitoring and alerting. > > KIP link: KIP-427: Add AtMinIsr topic partition category (new metric & > TopicCommand option) > < > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=103089398 > > > > Please take a look and let me know what you think. > > Regards, > Kevin >