My mistake. I didn't noticed "UNBOUNDED PRECEDING" already supported. 
So cumulative sum should work then.
Thanks
Yong

From: [email protected]
To: [email protected]; [email protected]
CC: [email protected]; [email protected]
Subject: RE: Spark SQL running totals
Date: Thu, 15 Oct 2015 16:24:39 -0400




Not sure the windows function can work for his case.
If you do a "sum() over (partitioned by)", that will return a total sum per 
partition, instead of a cumulative sum wanted in this case.
I saw there is a "cume_dis", but no "cume_sum".
Do we really have a "cume_sum" in Spark window function, or am I total 
misunderstand about "sum() over (partitioned by)" in it?
Yong

From: [email protected]
Date: Thu, 15 Oct 2015 11:51:59 -0700
Subject: Re: Spark SQL running totals
To: [email protected]
CC: [email protected]; [email protected]

Check out: 
https://databricks.com/blog/2015/07/15/introducing-window-functions-in-spark-sql.html
On Thu, Oct 15, 2015 at 11:35 AM, Deenar Toraskar <[email protected]> 
wrote:
you can do a self join of the table with itself with the join clause being 
a.col1 >= b.col1
select a.col1, a.col2, sum(b.col2)from tablea as a left outer join tablea as b 
on (a.col1 >= b.col1)group by a.col1, a.col2
I havent tried it, but cant see why it cant work, but doing it in RDD might be 
more efficient see 
https://bzhangusc.wordpress.com/2014/06/21/calculate-running-sums/
On 15 October 2015 at 18:48, Stefan Panayotov <[email protected]> wrote:



Hi,
 
I need help with Spark SQL. I need to achieve something like the following.
If I have data like:
 
col_1  col_2
1         10
2         30
3         15
4         20
5         25
 
I need to get col_3 to be the running total of the sum of the previous rows of 
col_2, e.g.
 
col_1  col_2  col_3
1         10        10
2         30        40
3         15        55
4         20        75
5         25        100
 
Is there a way to achieve this in Spark SQL or maybe with Data frame 
transformations?
 
Thanks in advance,


Stefan Panayotov, PhD 
Home: 610-355-0919 
Cell: 610-517-5586 
email: [email protected] 
[email protected] 
[email protected]
                                          



                                                                                
  

Reply via email to