Re: what is the difference between ³hive.compute.splits.in.am=true²and "hive.compute.splits.in.am=false"

Gopal Vijayaraghavan Mon, 18 Jan 2016 19:44:43 -0800

>what is the difference between³hive.compute.splits.in.am=true²and
>"hive.compute.splits.in.am=false"?
>which value is better?


First up, those options are specific to Tez.

The old MapReduce model was to always compute splits before asking for
resources to run. And this uses the gateway host (where the CLI runs) to
do that.

That model runs sequentially and overload single gateway machines during
heavy concurrency, particularly when used via ODBC (HiveServer2 mode).

Here's an old slide explaining how that speeds up queries.

http://www.slideshare.net/Hadoop_Summit/w-235phall1pandey/29


This dynamic & pipelined model lays down the foundation for optimizations
like Tez's dynamic partition pruning.

Cheers,
Gopal

Re: what is the difference between ³hive.compute.splits.in.am=true²and "hive.compute.splits.in.am=false"

Reply via email to