Sure…there are two types of purgatories: Consumer and Producer Consumer purgatory (for a partition leader) is a queue for pending requests for data (i.e. polling by some client for the respective partition). It’s basically a waiting area for poll requests. Generally speaking, the more consumers there are, the larger this queue will grow…and the larger this queue, the longer consumers will wait for data.
The Producer queue is for write requests (for a partition leader). In my experience, the producer queue is more of a bottle neck (because it has to write to disk…which is a slower operation). Although Kafka delegates to the OS for disk IO, I have found this purgatory-size to be highly predictive of cluster stress. WRT ISR, if you relax the replica setting for your topics, this will obviously mitigate this issue (i.e. fewer synchronizations needed). Network latency was also one of our biggest issues. Have you checked that? -David On 3/22/17, 8:28 PM, "Jun MA" <mj.saber1...@gmail.com> wrote: Hi David, I checked our cluster, the producer purgatory size is under 3 mostly. But I’m not quite understand this metrics, could you please explain it a little bit? Thanks, Jun > On Mar 22, 2017, at 3:07 PM, David Garcia <dav...@spiceworks.com> wrote: > > producer purgatory size