Hello,
We have been using Spark on a on-premise cluster for several years and
looking at moving to a cloud deployment.
I was wondering what is your current favorite cloud setup. Just simple
AWR / Azure, or something on top like Databricks ?
This would support a on demand report application
Once you have a RelationalGroupedDataSet , you can use agg() to perform
group wide operation such max , sum , etc ... or even custom aggregator.
df.groupBy().agg(sum(col(...)))
That will return a DF with your groupBy columns and result of the
aggregation
Stephane
Soheil Pourbafrani wrote:
Hi,