Hi Pals,
I have the below Hive SQL which is hitting the following error "at
java.lang.Thread.run(Thread.java:745) Caused by: java.lang.OutOfMemoryError:
Java heap space at". It's basically going out of memory. The table on which the
query is being hit has 246608473 (246 million) records, its size is around 43
GB's. I am running this sql on a Hadoop cluster which has 4 nodes, every node
has 16GB memory and 128 GB disk space. I can definitely increase the memory,
can scale up more clusters and try but is there something that I can do to make
this query work without having to touch the clusters or the memory?
create table t1_content_pages_agg_by_month stored as orc
as
select * from (
select A.dt
,A.year
,A.month
,A.bouncer
,A.visitor_type
,A.device_type
,A.pg_domain_name
,A.pg_page_url
,A.class1_id
,A.class2_id
,A.total_page_view_time
,row_number() over ( PARTITION BY A.dt,A.year, A.month,
A.bouncer,A.visitor_type,A.device_type) as rank
from content_pages_agg_by_month A
)AA
;
Regards,
Sujeet Singh Pardeshi
Software Specialist
SAS Research and Development (India) Pvt. Ltd.
Level 2A and Level 3, Cybercity, Magarpatta, Hadapsar Pune, Maharashtra, 411
013
off: +91-20-30418810
[Description: untitled]
"When the solution is simple, God is answering..."