Hi Pals,
I have the below Hive SQL which is hitting the following error "at 
java.lang.Thread.run(Thread.java:745) Caused by: java.lang.OutOfMemoryError: 
Java heap space at". It's basically going out of memory. The table on which the 
query is being hit has 246608473 (246 million) records, its size is around 43 
GB's. I am running this sql on a Hadoop cluster which has 4 nodes, every node 
has 16GB memory and 128 GB disk space. I can definitely increase the memory, 
can scale up more clusters and try but is there something that I can do to make 
this query work without having to touch the clusters or the memory?

create table t1_content_pages_agg_by_month stored as orc
as
select * from (
select A.dt
               ,A.year
               ,A.month
               ,A.bouncer
               ,A.visitor_type
               ,A.device_type
               ,A.pg_domain_name
               ,A.pg_page_url
               ,A.class1_id
               ,A.class2_id
               ,A.total_page_view_time
               ,row_number() over ( PARTITION BY A.dt,A.year, A.month, 
A.bouncer,A.visitor_type,A.device_type) as rank
from content_pages_agg_by_month A
)AA
;


Regards,

Sujeet Singh Pardeshi

Software Specialist

SAS Research and Development (India) Pvt. Ltd.
Level 2A and Level 3, Cybercity, Magarpatta, Hadapsar  Pune, Maharashtra, 411 
013
off: +91-20-30418810
[Description: untitled]
 "When the solution is simple, God is answering..."

Reply via email to