Hi, After facing issues with the performance of some of our Spark Streaming jobs, we invested quite some effort figuring out the factors that affect the performance characteristics of a Streaming job. We defined an empirical model that helps us reason about Streaming jobs and applied it to tune the jobs in order to maximize throughput.
We have summarized our findings in a blog post with the intention of collecting feedback and hoping that it is useful to other Spark Streaming users facing similar issues. http://www.virdata.com/tuning-spark/ Your feedback is welcome. With kind regards, Gerard. Data Processing Team Lead Virdata.com @maasg