>               ,row_number() over ( PARTITION BY A.dt,A.year, A.month, 
>A.bouncer,A.visitor_type,A.device_type order by A.total_page_view_time desc ) 
>as rank 
from content_pages_agg_by_month A

The row_number() window function is a streaming function, so this should not 
consume a significant part of memory as part of this operation.

I suspect there's some issue with the build of Hive you are using which is 
preventing it from using less memory, but I can't be sure.

While the query is running, take a jstack of one of the TezChild instances and 
then you can possibly file a bug with your vendor & get a patch for the problem.

This particular function was improved significantly in Hive 3.0, by vectorizing 
the implementation natively

https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/ptf/VectorPTFEvaluatorRowNumber.java#L29
 
Cheers,
Gopal 

Reply via email to