Not sure if there's anything special in external tables.
May be you can try Cost Based Optimizer in Hive 0.14 and above.

analyze table your_table compute statistics;
analyze table your_table compute statistics for columns col_1, col_2,...; or
analyze table your_table compute statistics for columns;

set hive.cbo.enable=true;
Then try the join with 21 tables.

Regards,
Nemon

From: Sanka, Himabindu [mailto:himabindu_sa...@optum.com]
Sent: Thursday, March 24, 2016 9:50 AM
To: user@hive.apache.org
Subject: Issue joining 21 HUGE Hive tables

Hi Team,

I need some inputs from you. I have a requirement for my project where I have 
to join 21 hive external tables.

Out of which 6 tables are HUGE  having 500 million records of data. Other 15 
tables are smaller ones around 100 to 1000 records each.

When I am doing inner joins/ left outer joins its taking hours to run the query.

Please let me know some optimization techniques or any other eco system 
components that performs better than HIVE.


Regards,
Hima



This e-mail, including attachments, may include confidential and/or
proprietary information, and may be used only by the person or entity
to which it is addressed. If the reader of this e-mail is not the intended
recipient or his or her authorized agent, the reader is hereby notified
that any dissemination, distribution or copying of this e-mail is
prohibited. If you have received this e-mail in error, please notify the
sender by replying to this message and delete this e-mail immediately.

Reply via email to