Thanks philip.
Thanks,
Ranjith
On May 23, 2012, at 4:15 AM, Philip Tromans wrote:
> Hi Ranjith,
>
> I haven't checked the code (so this might not be true), but I think that the
> map side aggregation stuff uses it's own hash map within the map phase to do
> the aggregation, instead of using
Hi Ranjith,
I haven't checked the code (so this might not be true), but I think that
the map side aggregation stuff uses it's own hash map within the map phase
to do the aggregation, instead of using a combiner, so you wouldn't expect
to see any combine input records. Have a look for parameters
li
Thanks Matt. I am not performing a join so does that matter? What does this
local task do?
Thanks,
Ranjith
On May 22, 2012, at 8:17 PM, "Tucker, Matt" wrote:
> Try setting hive.auto.convert.join to true. The CLI will have a local task
> before it starts a map-reduce job on the cluster.
>
>
Try setting hive.auto.convert.join to true. The CLI will have a local task
before it starts a map-reduce job on the cluster.
Matt
On May 22, 2012, at 8:43 PM, "Raghunath, Ranjith"
mailto:ranjith.raghuna...@usaa.com>> wrote:
I have the parameter hive.map.aggr set to true. However, when I loo
I have the parameter hive.map.aggr set to true. However, when I look at the
counters associated with the map tasks I notice the following "Combine input
records 0". I am interpreting this as a failure to perform the map side
aggregation. Is that accurate? Is this option not working in hive 0.7.1
ive Queries Performance Tuning - Map side joins, Map side
aggregations, Partitioning/Clustering
Anand
You can optimize pretty much all hive queries. Based on your queries you
need to do the optimizations. For example Group By has some specific way to be
optimized. Some times Distribute By come
quot;Ladda, Anand"
To: "user@hive.apache.org"
Sent: Sunday, April 1, 2012 11:59 PM
Subject: Hive Queries Performance Tuning - Map side joins, Map side
aggregations, Partitioning/Clustering
I am trying to understand what are some of the options/settings available to
tune the pe
Anand,
best place to understand the join queries on hive is from the presentation
by Namit Jain from Facebook.
Here is the pdf
https://cwiki.apache.org/Hive/presentations.data/Hive%20Summit%202011-join.pdf
you can search the video on youtube. Its very well described
On Sun, Apr 1, 2012 at 11:59
I am trying to understand what are some of the options/settings available to
tune the performance of Hive Queries. I have seen the benefits of Map side
joins and Partitioning/Clustering. However I have yet to realize the impact map
side aggregation has on query performance. I tried running this