Hi,
We have datasets which are about 10-15 TB in size.

We want to run hive queries on top of this input data.

What are ways to reduce stress on our cluster for running many such big 
queries( include joins too) in parallel ?
How to enable compression etc for intermediate hive output ?
How to make job cache does not go to high etc ?
In short , best practices for huge queries on hive ?

Any inputs are really appreciated !

Thanks,
JJ

Sent from my iPhone

Reply via email to