I have a use case similar to this:
http://stackoverflow.com/questions/33878370/spark-dataframe-select-the-first-row-of-each-group

and I'm trying to understand the solution titled "ordering over structs":

1)  Is a struct in Spark like a struct in C++?
2)  What is an alias in this context?
3)  How does this code even work?
4)  Is it faster doing it this way than doing a join or window function in
Spark SQL?

val dfTop = df.select($"Hour", struct($"TotalValue", $"Category").alias("vs"))
  .groupBy($"hour")
  .agg(max("vs").alias("vs"))
  .select($"Hour", $"vs.Category", $"vs.TotalValue")

thank you,
imran

Reply via email to