Re: Map side aggregations

2012-05-23 Thread Ranjith
Thanks philip. Thanks, Ranjith On May 23, 2012, at 4:15 AM, Philip Tromans wrote: > Hi Ranjith, > > I haven't checked the code (so this might not be true), but I think that the > map side aggregation stuff uses it's own hash map within the map phase to do > the aggregation, instead of using

Re: Map side aggregations

2012-05-23 Thread Philip Tromans
Hi Ranjith, I haven't checked the code (so this might not be true), but I think that the map side aggregation stuff uses it's own hash map within the map phase to do the aggregation, instead of using a combiner, so you wouldn't expect to see any combine input records. Have a look for parameters li

Re: Map side aggregations

2012-05-22 Thread Ranjith
Thanks Matt. I am not performing a join so does that matter? What does this local task do? Thanks, Ranjith On May 22, 2012, at 8:17 PM, "Tucker, Matt" wrote: > Try setting hive.auto.convert.join to true. The CLI will have a local task > before it starts a map-reduce job on the cluster. > >

Re: Map side aggregations

2012-05-22 Thread Tucker, Matt
Try setting hive.auto.convert.join to true. The CLI will have a local task before it starts a map-reduce job on the cluster. Matt On May 22, 2012, at 8:43 PM, "Raghunath, Ranjith" mailto:ranjith.raghuna...@usaa.com>> wrote: I have the parameter hive.map.aggr set to true. However, when I loo

Map side aggregations

2012-05-22 Thread Raghunath, Ranjith
I have the parameter hive.map.aggr set to true. However, when I look at the counters associated with the map tasks I notice the following "Combine input records 0". I am interpreting this as a failure to perform the map side aggregation. Is that accurate? Is this option not working in hive 0.7.1

RE: Hive Queries Performance Tuning - Map side joins, Map side aggregations, Partitioning/Clustering

2012-04-03 Thread Ladda, Anand
ive Queries Performance Tuning - Map side joins, Map side aggregations, Partitioning/Clustering Anand You can optimize pretty much all hive queries. Based on your queries you need to do the optimizations. For example Group By has some specific way to be optimized. Some times Distribute By come

Re: Hive Queries Performance Tuning - Map side joins, Map side aggregations, Partitioning/Clustering

2012-04-01 Thread Bejoy Ks
quot;Ladda, Anand" To: "user@hive.apache.org" Sent: Sunday, April 1, 2012 11:59 PM Subject: Hive Queries Performance Tuning - Map side joins, Map side aggregations, Partitioning/Clustering I am trying to understand what are some of the options/settings available to tune the pe

Re: Hive Queries Performance Tuning - Map side joins, Map side aggregations, Partitioning/Clustering

2012-04-01 Thread Nitin Pawar
Anand, best place to understand the join queries on hive is from the presentation by Namit Jain from Facebook. Here is the pdf https://cwiki.apache.org/Hive/presentations.data/Hive%20Summit%202011-join.pdf you can search the video on youtube. Its very well described On Sun, Apr 1, 2012 at 11:59

Hive Queries Performance Tuning - Map side joins, Map side aggregations, Partitioning/Clustering

2012-04-01 Thread Ladda, Anand
I am trying to understand what are some of the options/settings available to tune the performance of Hive Queries. I have seen the benefits of Map side joins and Partitioning/Clustering. However I have yet to realize the impact map side aggregation has on query performance. I tried running this