Re: Skew Join Optimization in hive

2011-06-07 Thread Shantian Purkad
logic and dependencies in the joins) From: Igor Tatarinov To: user@hive.apache.org; Shantian Purkad Sent: Tuesday, June 7, 2011 12:58 PM Subject: Re: Skew Join Optimization in hive Have you tried splitting the query into 2 or 3 steps and/or enabling map jons

Re: Skew Join Optimization in hive

2011-06-07 Thread Igor Tatarinov
Have you tried splitting the query into 2 or 3 steps and/or enabling map jons (SET hive.auto.convert.join = true;) if some of the tables are smallish? On Tue, Jun 7, 2011 at 12:31 PM, Shantian Purkad wrote: > Hi, > > I have a query which joins 12 different tables (most of them left outer > joins

Re: skew join optimization

2011-03-20 Thread Igor Tatarinov
Thanks everyone! I had a typo when setting auto convert to true. You can actually see it in my first email ('set' was repeated twice but there was no syntax error). With map joins enabled, my join finished in 30 minutes. Sweet! Looks like 'true' should be the default option for auto.convert Anot

Re: skew join optimization

2011-03-20 Thread yongqiang he
skew join does not work together with map join. Map join does not require any reducer. Please double check the hive that you use has the auto map join feature. If there is auto covert join is your hive, only SET set hive.auto.convert.join = true; should do the work. thanks yongqiang On Sun, Mar 2

Re: skew join optimization

2011-03-20 Thread Edward Capriolo
On Sun, Mar 20, 2011 at 11:20 AM, Ted Yu wrote: > How about link to http://imageshack.us/ or TinyPic ? > > Thanks > > On Sun, Mar 20, 2011 at 7:56 AM, Edward Capriolo > wrote: >> >> On Sun, Mar 20, 2011 at 10:30 AM, Ted Yu wrote: >> > Can someone re-attach the missing figures for that wiki ? >>

Re: skew join optimization

2011-03-20 Thread Ted Yu
How about link to http://imageshack.us/ or TinyPic ? Thanks On Sun, Mar 20, 2011 at 7:56 AM, Edward Capriolo wrote: > On Sun, Mar 20, 2011 at 10:30 AM, Ted Yu wrote: > > Can someone re-attach the missing figures for that wiki ? > > > > Thanks > > > > On Sun, Mar 20, 2011 at 7:15 AM, bharath vis

Re: skew join optimization

2011-03-20 Thread Edward Capriolo
On Sun, Mar 20, 2011 at 10:30 AM, Ted Yu wrote: > Can someone re-attach the missing figures for that wiki ? > > Thanks > > On Sun, Mar 20, 2011 at 7:15 AM, bharath vissapragada > wrote: >> >> Hi Igor, >> >> See http://wiki.apache.org/hadoop/Hive/JoinOptimization and see the >> jira 1642 which aut

Re: skew join optimization

2011-03-20 Thread Ted Yu
Can someone re-attach the missing figures for that wiki ? Thanks On Sun, Mar 20, 2011 at 7:15 AM, bharath vissapragada < bharathvissapragada1...@gmail.com> wrote: > Hi Igor, > > See http://wiki.apache.org/hadoop/Hive/JoinOptimization and see the > jira 1642 which automatically converts a normal

Re: skew join optimization

2011-03-20 Thread bharath vissapragada
Hi Igor, See http://wiki.apache.org/hadoop/Hive/JoinOptimization and see the jira 1642 which automatically converts a normal join into map-join (Otherwise you can specify the mapjoin hints in the query itself.). Because your 'S' table is very small , it can be replicated across all the mappers and

Re: skew join optimization

2011-03-20 Thread Jov
2011/3/20 Igor Tatarinov : > I have the following join that takes 4.5 hours (with 12 nodes) mostly > because of a single reduce task that gets the bulk of the work: > SELECT ... > FROM T > LEFT OUTER JOIN S > ON T.timestamp = S.timestamp and T.id = S.id > This is a 1:0/1 join so the size of the out