YES! That did it, I will be adding that one to our global config. Good to see they defaulted it to false in 0.14. Thanks Prasanth
On Thu, Mar 12, 2015 at 9:29 PM, Prasanth Jayachandran < pjayachand...@hortonworks.com> wrote: > Hi > > Can you try with hive.optimize.sort.dynamic.partition set to false? > > Thanks > Prasanth > > > > > On Thu, Mar 12, 2015 at 9:02 PM -0700, "Alex Bohr" <a...@gradientx.com> > wrote: > > I'm inserting from an unpartitioned table with a 6 hours of data into a > table partitioned by hour. > > The source table is 400M rows and 500GB so it's needs a lot of reducers > working on the data - Hive chose 544 which sounds good. > > But 538 reducers did nothing and the other 6 are working for over an > hour with all the data. > > I see from running explain on the query: > Map-reduce partition columns: _col54 (type: int), _col55 (type: int), > _col56 (type: int), _col57 (type: int) > > which the partition columns of the destination table (year, month, day, > hour). > That's an unnecessary centralization of work, I don't need each partition > to be written by only one reducer. Each destination partition should > instead include a bunch of output files from various Reducers. If I wrote > my own M/R job I would use MultipleOutputs and partition on epoch or > something. > > So I hacked it, and added another column to the destination partition > after the hour column- a random number up to 200. Now all the reducers are > sharing the work. > > *Is there any other way I can get Hive to distribute the work to all > reducers without hacking the table DDL with random columns?* > > I'm on Hive 0.13 with Beeline and HiveServer2 and start the query off > with the settings: > set hive.exec.dynamic.partition=true; > set hive.exec.dynamic.partition.mode=nonstrict; > > Thanks >