Hi Shaun,

Too many partitions in dynamic partitioning may slow down the mapreduce
job. Can you estimate how many partitions will be generated after insert?


On Thu, Jun 6, 2013 at 4:24 PM, Shaun Clowes <sclo...@atlassian.com> wrote:

> Hi All,
>
> Does anyone know the performance impact the dynamic partitions should be
> expected to have?
>
> I have a table that is partitioned by a string in the form 'YYYY-MM'. When
> I insert in to this table (from an external table that is just an S3 bucket
> containing gzipped logs) using dynamic partitioning I get very slow
> performance with each node in the cluster unable to process more than 2MB
> per second. When I run the exact same query with static partition values I
> get more about 30-40MB/s on each node.
>
> I've never seen this type of problem with our internal cluster running
> Hive 0.7.1 (CDH3u4), but it happens every time in EMR.
>
> Thanks,
> Shaun
>



-- 
Regards,
Ted Xu

Reply via email to