Re: adding a split and union to a streaming application cause big performance hit

krishna ramachandran Thu, 18 Feb 2016 06:18:07 -0800

I tried these 2 global settings (and restarted the app) after enabling
cache for stream1


conf.set("spark.streaming.unpersist", "true")

streamingContext.remember(Seconds(batchDuration * 4))

batch duration is 4 sec

Using spark-1.4.1. The application runs for about 4-5 hrs then see out of
memory error

regards

Krishna

On Thu, Feb 18, 2016 at 4:54 AM, Ted Yu <yuzhih...@gmail.com> wrote:

> bq. streamingContext.remember("duration") did not help
>
> Can you give a bit more detail on the above ?
> Did you mean the job encountered OOME later on ?
>
> Which Spark release are you using ?
>
> Cheers
>
> On Wed, Feb 17, 2016 at 6:03 PM, ramach1776 <ram...@s1776.com> wrote:
>
>> We have a streaming application containing approximately 12 jobs every
>> batch,
>> running in streaming mode (4 sec batches). Each  job has several
>> transformations and 1 action (output to cassandra) which causes the
>> execution of the job (DAG)
>>
>> For example the first job,
>>
>> /job 1
>> ---> receive Stream A --> map --> filter -> (union with another stream B)
>> --> map -->/ groupbykey --> transform --> reducebykey --> map
>>
>> Likewise we go thro' few more transforms and save to database (job2,
>> job3...)
>>
>> Recently we added a new transformation further downstream wherein we union
>> the output of DStream from job 1 (in italics) with output from a new
>> transformation(job 5). It appears the whole execution thus far is repeated
>> which is redundant (I can see this in execution graph & also performance
>> ->
>> processing time).
>>
>> That is, with this additional transformation (union with a stream
>> processed
>> upstream) each batch runs as much as 2.5 times slower compared to runs
>> without the union. If I cache the DStream from job 1(italics), performance
>> improves substantially but hit out of memory errors within few hours.
>>
>> What is the recommended way to cache/unpersist in such a scenario? there
>> is
>> no dstream level "unpersist"
>> setting "spark.streaming.unpersist" to true and
>> streamingContext.remember("duration") did not help.
>>
>>
>>
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/adding-a-split-and-union-to-a-streaming-application-cause-big-performance-hit-tp26259.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>

Re: adding a split and union to a streaming application cause big performance hit

Reply via email to