1) Is a struct in Spark like a struct in C++?
Kinda. Its an ordered collection of data with known names/types. 2) What is an alias in this context? it is assigning a name to the column. similar to doing AS in sql. 3) How does this code even work? Ordering for a struct goes in order of the fields. So the max struct is the one with the highest TotalValue (and then the highest category if there are multiple entries with the same hour and total value). Is this due to "InterpretedOrdering" in StructType? 4) Is it faster doing it this way than doing a join or window function in Spark SQL? Way faster. This is a very efficient way to calculate argmax. Can you explain how this is way faster than window function? I can understand join doesn't make sense in this case. But to calculate the grouping max, you just have to shuffle the data by grouping keys. You maybe can do a combiner on the mapper side before shuffling, but that is it. Do you mean windowing function in Spark SQL won't do any map side combiner, even it is for max? Yong