Hi, Sadha
I have solved this problem. And in my case it was caused by the different
compression suite between hive and spark. In detail,Hive take ZLIB as default
ORC compression suite but Spark take SNAPPY. Finally, when I took the same
compression suite, final table file produced by spark sql
Do you have to use SQL/window function for this? If I understand this
correctly, you could just keep track of the last record of each "thing",
then calculate the new sum by adding the current value of "thing" to the
sum of last record when a new record is generated. Looks like your
problem will