>
> Ordering for a struct goes in order of the fields.  So the max struct is
> the one with the highest TotalValue (and then the highest category
>       if there are multiple entries with the same hour and total value).
>
> Is this due to "InterpretedOrdering" in StructType?
>

That is one implementation, but the code generated ordering also follows
the same contract.



>  4)  Is it faster doing it this way than doing a join or window function
> in Spark SQL?
>
> Way faster.  This is a very efficient way to calculate argmax.
>
> Can you explain how this is way faster than window function? I can
> understand join doesn't make sense in this case. But to calculate the
> grouping max, you just have to shuffle the data by grouping keys. You maybe
> can do a combiner on the mapper side before shuffling, but that is it. Do
> you mean windowing function in Spark SQL won't do any map side combiner,
> even it is for max?
>

Windowing can't do partial aggregation and will have to collect all the
data for a group so that it can be sorted before applying the function.  In
contrast a max aggregation will do partial aggregation (map side combining)
and can be calculated in a streaming fashion.

Also, aggregation is more common and thus has seen more optimization beyond
the theoretical limits described above.

Reply via email to