Re: Question - Nested JSON using Hive

2012-04-01 Thread Ashwanth Kumar
You get that error because "location" is a keyword in Hive. Try to encapsulate it in ` char and try. On Mon, Apr 2, 2012 at 7:07 AM, Anurag Gulati wrote: > I’ve been trying to figure this out for a couple days now and I haven’t > gotten very far. > > Looking for your guidance on the matter.*

Question - Nested JSON using Hive

2012-04-01 Thread Anurag Gulati
I've been trying to figure this out for a couple days now and I haven't gotten very far. Looking for your guidance on the matter. As a test, I'm trying to import Facebook Open Graph API data into Hive but am having a problem with the syntax. Here is a line of sample data I'm trying to import (m

Re: Hive Queries Performance Tuning - Map side joins, Map side aggregations, Partitioning/Clustering

2012-04-01 Thread Bejoy Ks
Anand      You can optimize pretty much all hive queries. Based on your queries you need to do the optimizations. For example Group By has some specific way to be optimized. Some times Distribute By comes in handy for optimizing some queries. Skew joins are good to balace the reducer loads. etc

Re: Why BucketJoinMap consume too much memory

2012-04-01 Thread Bejoy Ks
Hi On a first look, it seems like map join is happening in your case other than bucketed map join. The following conditions need to hold for bucketed map join to work 1) Both the tables are bucketed on the join columns 2) The number of buckets in each table should be multiples of each other 3

Re: Hive Queries Performance Tuning - Map side joins, Map side aggregations, Partitioning/Clustering

2012-04-01 Thread Nitin Pawar
Anand, best place to understand the join queries on hive is from the presentation by Namit Jain from Facebook. Here is the pdf https://cwiki.apache.org/Hive/presentations.data/Hive%20Summit%202011-join.pdf you can search the video on youtube. Its very well described On Sun, Apr 1, 2012 at 11:59

Hive Queries Performance Tuning - Map side joins, Map side aggregations, Partitioning/Clustering

2012-04-01 Thread Ladda, Anand
I am trying to understand what are some of the options/settings available to tune the performance of Hive Queries. I have seen the benefits of Map side joins and Partitioning/Clustering. However I have yet to realize the impact map side aggregation has on query performance. I tried running this