I see. Thanks a lot that's very helpful! Daniel
> On 7 בדצמ׳ 2014, at 09:10, Gopal V <[email protected]> wrote: > >> On 12/6/14, 10:11 PM, Daniel Haviv wrote: >> >> Isn't there a way to make hive allocate more than one reducer for the whole >> job? Maybe one >> per partition. > > Yes. > > hive.optimize.sort.dynamic.partition=true; does nearly that. > > It raises the net number of useful reducers to total-num-of-partitions x > total-num-buckets. > > If you have say, data being written into six hundred partitions with 1 bucket > each, it can use anywhere between 1 and 600 reducers (hashcode collisions > causing skews, of course). > > It's turned off by default, because it really slows down the 1 partition > without buckets insert speed. > > Cheers, > Gopal > >>>> On 7 בדצמ׳ 2014, at 06:06, Gopal V <[email protected]> wrote: >>>> >>>> On 12/6/14, 6:27 AM, Daniel Haviv wrote: >>>> Hi, >>>> I'm executing an insert statement that goes over 1TB of data. >>>> The map phase goes well but the reduce stage only used one reducer which >>>> becomes >> a great bottleneck. >>> >>> Are you inserting into a bucketed or sorted table? >>> >>> If the destination table is bucketed + partitioned, you can use the dynamic >>> partition >> sort optimization to get beyond the single reducer. >>> >>> Cheers, >>> Gopal >
