I tried these 2 global settings (and restarted the app) after enabling cache for stream1
conf.set("spark.streaming.unpersist", "true") streamingContext.remember(Seconds(batchDuration * 4)) batch duration is 4 sec Using spark-1.4.1. The application runs for about 4-5 hrs then see out of memory error regards Krishna On Thu, Feb 18, 2016 at 4:54 AM, Ted Yu <yuzhih...@gmail.com> wrote: > bq. streamingContext.remember("duration") did not help > > Can you give a bit more detail on the above ? > Did you mean the job encountered OOME later on ? > > Which Spark release are you using ? > > Cheers > > On Wed, Feb 17, 2016 at 6:03 PM, ramach1776 <ram...@s1776.com> wrote: > >> We have a streaming application containing approximately 12 jobs every >> batch, >> running in streaming mode (4 sec batches). Each job has several >> transformations and 1 action (output to cassandra) which causes the >> execution of the job (DAG) >> >> For example the first job, >> >> /job 1 >> ---> receive Stream A --> map --> filter -> (union with another stream B) >> --> map -->/ groupbykey --> transform --> reducebykey --> map >> >> Likewise we go thro' few more transforms and save to database (job2, >> job3...) >> >> Recently we added a new transformation further downstream wherein we union >> the output of DStream from job 1 (in italics) with output from a new >> transformation(job 5). It appears the whole execution thus far is repeated >> which is redundant (I can see this in execution graph & also performance >> -> >> processing time). >> >> That is, with this additional transformation (union with a stream >> processed >> upstream) each batch runs as much as 2.5 times slower compared to runs >> without the union. If I cache the DStream from job 1(italics), performance >> improves substantially but hit out of memory errors within few hours. >> >> What is the recommended way to cache/unpersist in such a scenario? there >> is >> no dstream level "unpersist" >> setting "spark.streaming.unpersist" to true and >> streamingContext.remember("duration") did not help. >> >> >> >> >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/adding-a-split-and-union-to-a-streaming-application-cause-big-performance-hit-tp26259.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> >