Hi John, We would need more details to be able to help. What is the version of your producers and consumers, is compression being used (and the compression type if it is) and what is the broker/topic message format version?
Ismael On Sun, Jul 9, 2017 at 1:13 PM, John Yost <hokiege...@gmail.com> wrote: > Hey Everyone, > > When we originally upgraded from 0.9.0.1 to 0.10.0 with the exact same > settings we immediately observed OOM errors. I upped the heap size from 6 > GB to 10 GB and that solved the OOM issue. However, I am now seeing that > the ISR count for all partitions goes from 3 to 1 after about an hour > following broker start. > > Monitoring with jstat it appears that, after about an hour, the young > generation partition stays at or near 100%, at which point the ISR count > for each partition goes from 3 to 1 and remains there. There appears to be > a correlation of high GC activity and replica fetch lag. > > I am thinking that GC pauses are the issue, which is a result of increasing > the memory heap size. But, without increasing the memory heap size, we get > OOM errors. > > Any ideas? There must be a setting somewhere that is causing the memory > heap to fill up in 0.10.0 that did not affect 0.9.0.1. > > Thanks > > --John >