Multiple streaming jobs on the same topic

Mayur Mohite Fri, 01 Apr 2016 00:42:52 -0700

Hi,

We have a kafka cluster running in production and there are two spark
streaming job (J1 and J2) that fetches the data from the same topic.


We noticed that if one of the two jobs (say J1) starts reading data from
old offset (that job failed for 2 hours and when we started the job after
fixing the failure the offset was old), that data is read from disk instead
of reading from OS cache.

When this happens the other job's (J2) throughput is reduced even though
that job's offset is recent.
We believe that the recent data is most likely in memory so we are not sure
why the other job's (J2) throughput is reduced.

Did anyone come across such an issue in production? If yes how did you fix
the issue?

-Mayur

-- 


Learn more about our inaugural *FirstScreen Conference 
<http://www.firstscreenconf.com/>*!
*Where the worlds of mobile advertising and technology meet!*

June 15, 2016 @ Urania Berlin

Multiple streaming jobs on the same topic

Reply via email to