You can tag the last entry with each key using the same window you're using
for your rolling sum. Something like this: "LEAD(1) OVER your_window IS
NULL as last_record". Then, you just UNION ALL the last entry of each
key(which you tagged) with the new data and run the same query over the
windowed
Do you have to use SQL/window function for this? If I understand this
correctly, you could just keep track of the last record of each "thing",
then calculate the new sum by adding the current value of "thing" to the
sum of last record when a new record is generated. Looks like your
problem will
I'm new to Spark and would like to seek some advice on how to approach a
problem.
I have a large dataset that has dated observations. There are also columns that
are running sums of some of other columns.
date | thing | foo | bar | foo_sum | bar_sum |
+===+===