How to run big queries in optimized way ?

Mapred Learn Thu, 20 Sep 2012 19:30:53 -0700

Hi,
We have datasets which are about 10-15 TB in size.

We want to run hive queries on top of this input data.


What are ways to reduce stress on our cluster for running many such big 
queries( include joins too) in parallel ?
How to enable compression etc for intermediate hive output ?
How to make job cache does not go to high etc ?
In short , best practices for huge queries on hive ?

Any inputs are really appreciated !

Thanks,
JJ

Sent from my iPhone

How to run big queries in optimized way ?

Reply via email to