Hi Igor, See http://wiki.apache.org/hadoop/Hive/JoinOptimization and see the jira 1642 which automatically converts a normal join into map-join (Otherwise you can specify the mapjoin hints in the query itself.). Because your 'S' table is very small , it can be replicated across all the mappers and the reduce phase can be avoided. This can greatly reduce the runtime .. (See the results section in the page for details.).
Hope this helps. Thanks On Sun, Mar 20, 2011 at 6:37 PM, Jov <zhao6...@gmail.com> wrote: > 2011/3/20 Igor Tatarinov <i...@decide.com>: >> I have the following join that takes 4.5 hours (with 12 nodes) mostly >> because of a single reduce task that gets the bulk of the work: >> SELECT ... >> FROM T >> LEFT OUTER JOIN S >> ON T.timestamp = S.timestamp and T.id = S.id >> This is a 1:0/1 join so the size of the output is exactly the same as the >> size of T (500M records). S is actually very small (5K). >> I've tried: >> - switching the order of the join conditions >> - using a different hash function setting (jenkins instead of murmur) >> - using SET set hive.auto.convert.join = true; > > are you sure your query convert to mapjoin? if not,try use explicit > mapjoin hint. > > >> - using SET hive.optimize.skewjoin = true; >> but nothing helped :( >> Anything else I can try? >> Thanks! > -- Regards, Bharath .V w:http://research.iiit.ac.in/~bharath.v