Then, can you specify a size/percentage of cache per consumer group? On Apr 1, 2016 9:09 AM, "Cees de Groot" <c...@pagerduty.com> wrote:
> One of Kafka's design ideas is to keep data in the JVM to a minimum, > offloading caching to the OS. So on the Kafka level, there's pretty much > not much you can do - the old data is buffered by the system (has to be to > be read into userspace) and thus this reduces the amount of cache available > to the other job. > > Buy more memory ;-) > > (also, I think it's smart to tune _down_ the amount of memory you give to > the Kafka JVM, to maximize the OS's buffering. You don't want large amounts > of JVM memory filled with garbage contending with OS buffer cache filled > with useful data). > > On Fri, Apr 1, 2016 at 3:42 AM, Mayur Mohite <mayur.moh...@applift.com> > wrote: > > > Hi, > > > > We have a kafka cluster running in production and there are two spark > > streaming job (J1 and J2) that fetches the data from the same topic. > > > > We noticed that if one of the two jobs (say J1) starts reading data from > > old offset (that job failed for 2 hours and when we started the job after > > fixing the failure the offset was old), that data is read from disk > instead > > of reading from OS cache. > > > > When this happens the other job's (J2) throughput is reduced even though > > that job's offset is recent. > > We believe that the recent data is most likely in memory so we are not > sure > > why the other job's (J2) throughput is reduced. > > > > Did anyone come across such an issue in production? If yes how did you > fix > > the issue? > > > > -Mayur > > > > -- > > > > > > Learn more about our inaugural *FirstScreen Conference > > <http://www.firstscreenconf.com/>*! > > *Where the worlds of mobile advertising and technology meet!* > > > > June 15, 2016 @ Urania Berlin > > > > > > -- > Cees de Groot > Principal Software Engineer > PagerDuty, Inc. >