Which hive version and engine? If it is tez then you can also try mr as an engine set hive.execution.engine=mr that will use less memory. Check also the max heap space configuration on the nodes . Maybe you have physically 16 gb memory but the Java process takes only 4 or so memory.
Maybe your query could be also expressed differently, but I miss background information on the use case. Last but not least : size on disk != size in memory especially if you go beyond simple queries. Try also with a smaller subset of the data when you reach the memory limit > Am 10.01.2019 um 07:57 schrieb Sujeet Pardeshi <sujeet.parde...@sas.com>: > > Hi Pals, > I have the below Hive SQL which is hitting the following error “at > java.lang.Thread.run(Thread.java:745) Caused by: java.lang.OutOfMemoryError: > Java heap space at”. It’s basically going out of memory. The table on which > the query is being hit has 246608473 (246 million) records, its size is > around 43 GB’s. I am running this sql on a Hadoop cluster which has 4 nodes, > every node has 16GB memory and 128 GB disk space. I can definitely increase > the memory, can scale up more clusters and try but is there something that I > can do to make this query work without having to touch the clusters or the > memory? > > create table t1_content_pages_agg_by_month stored as orc > as > select * from ( > select A.dt > ,A.year > ,A.month > ,A.bouncer > ,A.visitor_type > ,A.device_type > ,A.pg_domain_name > ,A.pg_page_url > ,A.class1_id > ,A.class2_id > ,A.total_page_view_time > ,row_number() over ( PARTITION BY A.dt,A.year, A.month, > A.bouncer,A.visitor_type,A.device_type) as rank > from content_pages_agg_by_month A > )AA > ; > > Regards, > > Sujeet Singh Pardeshi > > Software Specialist > > SAS Research and Development (India) Pvt. Ltd. > > Level 2A and Level 3, Cybercity, Magarpatta, Hadapsar Pune, Maharashtra, 411 > 013 > off: +91-20-30418810 > <image001.png> > "When the solution is simple, God is answering…" >