On Sun, Mar 20, 2011 at 10:30 AM, Ted Yu <yuzhih...@gmail.com> wrote: > Can someone re-attach the missing figures for that wiki ? > > Thanks > > On Sun, Mar 20, 2011 at 7:15 AM, bharath vissapragada > <bharathvissapragada1...@gmail.com> wrote: >> >> Hi Igor, >> >> See http://wiki.apache.org/hadoop/Hive/JoinOptimization and see the >> jira 1642 which automatically converts a normal join into map-join >> (Otherwise you can specify the mapjoin hints in the query itself.). >> Because your 'S' table is very small , it can be replicated across all >> the mappers and the reduce phase can be avoided. This can greatly >> reduce the runtime .. (See the results section in the page for >> details.). >> >> Hope this helps. >> >> Thanks >> >> >> On Sun, Mar 20, 2011 at 6:37 PM, Jov <zhao6...@gmail.com> wrote: >> > 2011/3/20 Igor Tatarinov <i...@decide.com>: >> >> I have the following join that takes 4.5 hours (with 12 nodes) mostly >> >> because of a single reduce task that gets the bulk of the work: >> >> SELECT ... >> >> FROM T >> >> LEFT OUTER JOIN S >> >> ON T.timestamp = S.timestamp and T.id = S.id >> >> This is a 1:0/1 join so the size of the output is exactly the same as >> >> the >> >> size of T (500M records). S is actually very small (5K). >> >> I've tried: >> >> - switching the order of the join conditions >> >> - using a different hash function setting (jenkins instead of murmur) >> >> - using SET set hive.auto.convert.join = true; >> > >> > are you sure your query convert to mapjoin? if not,try use explicit >> > mapjoin hint. >> > >> > >> >> - using SET hive.optimize.skewjoin = true; >> >> but nothing helped :( >> >> Anything else I can try? >> >> Thanks! >> > >> >> >> >> -- >> Regards, >> Bharath .V >> w:http://research.iiit.ac.in/~bharath.v > >
The wiki does not allow images, confluence does but we have not moved their yet.