[Structured Streaming] Reuse computation result

Shu Li Zheng Tue, 26 Dec 2017 02:33:04 -0800

Hi all,

I have a scenario like this:


val df = dataframe.map().filter()
// agg 1
val query1 = df.sum.writeStream.start
// agg 2
val query2 = df.count.writeStream.start

With spark streaming, we can apply persist() on rdd to reuse the df computation 
result, when we call persist() after filter() map().filter() operator only run 
once.
With SS, we can’t apply persist() direct on dataframe. query1 and query2 will 
not reuse result after filter. map/filter run twice. So is there a way to solve 
this. 

Regards,

Shu li Zheng

[Structured Streaming] Reuse computation result

Reply via email to