Re: inserting dynamic partitions - need more reducers

Alex Bohr Thu, 12 Mar 2015 21:58:23 -0700

YES!
That did it, I will be adding that one to our global config.  Good to see
they defaulted it to false in 0.14.
Thanks Prasanth


On Thu, Mar 12, 2015 at 9:29 PM, Prasanth Jayachandran <
pjayachand...@hortonworks.com> wrote:

>  Hi
>
>  Can you try with hive.optimize.sort.dynamic.partition set to false?
>
> Thanks
> Prasanth
>
>
>
>
> On Thu, Mar 12, 2015 at 9:02 PM -0700, "Alex Bohr" <a...@gradientx.com>
> wrote:
>
>  I'm inserting from an unpartitioned table with a 6 hours of data into a
> table partitioned by hour.
>
>  The source table is 400M rows and 500GB so it's needs a lot of reducers
> working on the data - Hive chose 544 which sounds good.
>
>  But 538 reducers did nothing and the other 6 are working for over an
> hour with all the data.
>
>  I see from running explain on the query:
> Map-reduce partition columns: _col54 (type: int), _col55 (type: int),
> _col56 (type: int), _col57 (type: int)
>
>  which the partition columns of the destination table (year, month, day,
> hour).
> That's an unnecessary centralization of work, I don't need each partition
> to be written by only one reducer.  Each destination partition should
> instead include a bunch of output files from various Reducers.  If I wrote
> my own M/R job I would use MultipleOutputs and partition on epoch or
> something.
>
>  So I hacked it, and added another column to the destination partition
> after the hour column- a random number up to 200.  Now all the reducers are
> sharing the work.
>
>  *Is there any other way I can get Hive to distribute the work to all
> reducers without hacking the table DDL with random columns?*
>
>  I'm on Hive 0.13 with Beeline and HiveServer2 and start the query off
> with the settings:
>  set  hive.exec.dynamic.partition=true;
> set hive.exec.dynamic.partition.mode=nonstrict;
>
>  Thanks
>

Re: inserting dynamic partitions - need more reducers

Reply via email to