I'm no expert in hive, but here are my 2 cents.

By default hive schedules a reducer per every 1 GB of data ( change that
value by modifying *hive.exec.reducers.bytes.per.reducer ) *. If your input
data is huge, there will be large number of reducers, which might be
unnecessary.( Sometimes large number of reducers slows down the job because
their number exceeds total task slots and they keep waiting for their turn.
Not to forget, the initialization overheads for each task..jvm etc.).

Overall, I think there cannot be any optimum values for a cluster. It
depends on the type of queries, size of your inputs, size of map outputs in
the jobs (intermediate outputs ). So you can can check various values and
see which one is the best. From my experience setting
"hive.exec.reducers.max" to total number of reduce slots in your cluster
gives you a decent performance since all the reducers are completed in a
single wave. (This may or maynot work for you, worth giving a try).


On Wed, Sep 26, 2012 at 5:58 PM, Abhishek <abhishek.dod...@gmail.com> wrote:

> *
> *
> *Hi all,*
> *
> *
> *I have doubt regarding below properties, is it a good practice to
> override below properties in hive.*
> *
> *
> *If yes, what is the optimal values for the following properties?*
> *
>   set hive.exec.reducers.bytes.per.reducer=<number>
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=<number>
> In order to set a constant number of reducers:
>   set mapred.reduce.tasks=<number>*
> *
> *
> *Regards*
> *Abhi
> *
> Sent from my iPhone
>



-- 
Regards,
Bharath .V
w:http://researchweb.iiit.ac.in/~bharath.v

Reply via email to