Hey Ismael, Thanks a bunch for responding so quickly--really appreciate the follow-up! I will have to get those details tomorrow when I return to the office.
Thanks again, will forward details ASAP tomorrow. --John On Sun, Jul 9, 2017 at 10:41 AM, Ismael Juma <ism...@juma.me.uk> wrote: > Hi John, > > We would need more details to be able to help. What is the version of your > producers and consumers, is compression being used (and the compression > type if it is) and what is the broker/topic message format version? > > Ismael > > On Sun, Jul 9, 2017 at 1:13 PM, John Yost <hokiege...@gmail.com> wrote: > > > Hey Everyone, > > > > When we originally upgraded from 0.9.0.1 to 0.10.0 with the exact same > > settings we immediately observed OOM errors. I upped the heap size from 6 > > GB to 10 GB and that solved the OOM issue. However, I am now seeing that > > the ISR count for all partitions goes from 3 to 1 after about an hour > > following broker start. > > > > Monitoring with jstat it appears that, after about an hour, the young > > generation partition stays at or near 100%, at which point the ISR count > > for each partition goes from 3 to 1 and remains there. There appears to > be > > a correlation of high GC activity and replica fetch lag. > > > > I am thinking that GC pauses are the issue, which is a result of > increasing > > the memory heap size. But, without increasing the memory heap size, we > get > > OOM errors. > > > > Any ideas? There must be a setting somewhere that is causing the memory > > heap to fill up in 0.10.0 that did not affect 0.9.0.1. > > > > Thanks > > > > --John > > >