date:20221012

Re: Why the same INSERT OVERWRITE sql , final table file produced by spark sql is larger than hive sql？

2022-10-12 Thread Chartist

Hi, Sadha I have solved this problem. And in my case it was caused by the different compression suite between hive and spark. In detail，Hive take ZLIB as default ORC compression suite but Spark take SNAPPY. Finally, when I took the same compression suite, final table file produced by spark sql

Re: Efficiently updating running sums only on new data

2022-10-12 Thread Artemis User

Do you have to use SQL/window function for this? If I understand this correctly, you could just keep track of the last record of each "thing", then calculate the new sum by adding the current value of "thing" to the sum of last record when a new record is generated. Looks like your problem will