1)  Is a struct in Spark like a struct in C++?

         Kinda.  Its an ordered collection of data with known names/types. 2)  
What is an alias in this context?
          it is assigning a name to the column.  similar to doing AS in sql. 3) 
 How does this code even work?
              Ordering for a struct goes in order of the fields.  So the max 
struct is the one with the highest TotalValue (and then the highest category    
           if there are multiple entries with the same hour and total value).
Is this due to "InterpretedOrdering" in StructType? 4)  Is it faster doing it 
this way than doing a join or window function in Spark SQL?
           Way faster.  This is a very efficient way to calculate argmax.
Can you explain how this is way faster than window function? I can understand 
join doesn't make sense in this case. But to calculate the grouping max, you 
just have to shuffle the data by grouping keys. You maybe can do a combiner on 
the mapper side before shuffling, but that is it. Do you mean windowing 
function in Spark SQL won't do any map side combiner, even it is for max?
Yong
                                          

Reply via email to