> ,row_number() over ( PARTITION BY A.dt,A.year, A.month, >A.bouncer,A.visitor_type,A.device_type order by A.total_page_view_time desc ) >as rank from content_pages_agg_by_month A
The row_number() window function is a streaming function, so this should not consume a significant part of memory as part of this operation. I suspect there's some issue with the build of Hive you are using which is preventing it from using less memory, but I can't be sure. While the query is running, take a jstack of one of the TezChild instances and then you can possibly file a bug with your vendor & get a patch for the problem. This particular function was improved significantly in Hive 3.0, by vectorizing the implementation natively https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/ptf/VectorPTFEvaluatorRowNumber.java#L29 Cheers, Gopal