Hi Shaun, Too many partitions in dynamic partitioning may slow down the mapreduce job. Can you estimate how many partitions will be generated after insert?
On Thu, Jun 6, 2013 at 4:24 PM, Shaun Clowes <sclo...@atlassian.com> wrote: > Hi All, > > Does anyone know the performance impact the dynamic partitions should be > expected to have? > > I have a table that is partitioned by a string in the form 'YYYY-MM'. When > I insert in to this table (from an external table that is just an S3 bucket > containing gzipped logs) using dynamic partitioning I get very slow > performance with each node in the cluster unable to process more than 2MB > per second. When I run the exact same query with static partition values I > get more about 30-40MB/s on each node. > > I've never seen this type of problem with our internal cluster running > Hive 0.7.1 (CDH3u4), but it happens every time in EMR. > > Thanks, > Shaun > -- Regards, Ted Xu