Can someone re-attach the missing figures for that wiki ? Thanks
On Sun, Mar 20, 2011 at 7:15 AM, bharath vissapragada < bharathvissapragada1...@gmail.com> wrote: > Hi Igor, > > See http://wiki.apache.org/hadoop/Hive/JoinOptimization and see the > jira 1642 which automatically converts a normal join into map-join > (Otherwise you can specify the mapjoin hints in the query itself.). > Because your 'S' table is very small , it can be replicated across all > the mappers and the reduce phase can be avoided. This can greatly > reduce the runtime .. (See the results section in the page for > details.). > > Hope this helps. > > Thanks > > > On Sun, Mar 20, 2011 at 6:37 PM, Jov <zhao6...@gmail.com> wrote: > > 2011/3/20 Igor Tatarinov <i...@decide.com>: > >> I have the following join that takes 4.5 hours (with 12 nodes) mostly > >> because of a single reduce task that gets the bulk of the work: > >> SELECT ... > >> FROM T > >> LEFT OUTER JOIN S > >> ON T.timestamp = S.timestamp and T.id = S.id > >> This is a 1:0/1 join so the size of the output is exactly the same as > the > >> size of T (500M records). S is actually very small (5K). > >> I've tried: > >> - switching the order of the join conditions > >> - using a different hash function setting (jenkins instead of murmur) > >> - using SET set hive.auto.convert.join = true; > > > > are you sure your query convert to mapjoin? if not,try use explicit > > mapjoin hint. > > > > > >> - using SET hive.optimize.skewjoin = true; > >> but nothing helped :( > >> Anything else I can try? > >> Thanks! > > > > > > -- > Regards, > Bharath .V > w:http://research.iiit.ac.in/~bharath.v >