You could check following link. http://stackoverflow.com/questions/35154267/how-to-compute-cumulative-sum-using-spark
From: Jon Barksdale [mailto:jon.barksd...@gmail.com] Sent: 09 August 2016 08:21 To: ayan guha Cc: user Subject: Re: Cumulative Sum function using Dataset API I don't think that would work properly, and would probably just give me the sum for each partition. I'll give it a try when I get home just to be certain. To maybe explain the intent better, if I have a column (pre sorted) of (1,2,3,4), then the cumulative sum would return (1,3,6,10). Does that make sense? Naturally, if ordering a sum turns it into a cumulative sum, I'll gladly use that :) Jon On Mon, Aug 8, 2016 at 4:55 PM ayan guha <guha.a...@gmail.com<mailto:guha.a...@gmail.com>> wrote: You mean you are not able to use sum(col) over (partition by key order by some_col) ? On Tue, Aug 9, 2016 at 9:53 AM, jon <jon.barksd...@gmail.com<mailto:jon.barksd...@gmail.com>> wrote: Hi all, I'm trying to write a function that calculates a cumulative sum as a column using the Dataset API, and I'm a little stuck on the implementation. From what I can tell, UserDefinedAggregateFunctions don't seem to support windowing clauses, which I think I need for this use case. If I write a function that extends from AggregateWindowFunction, I end up needing classes that are package private to the sql package, so I need to make my function under the org.apache.spark.sql package, which just feels wrong. I've also considered writing a custom transformer, but haven't spend as much time reading through the code, so I don't know how easy or hard that would be. TLDR; What's the best way to write a function that returns a value for every row, but has mutable state, and gets row in a specific order? Does anyone have any ideas, or examples? Thanks, Jon -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Cumulative-Sum-function-using-Dataset-API-tp27496.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org<mailto:user-unsubscr...@spark.apache.org> -- Best Regards, Ayan Guha