Re: Extremely Slow Data Loading with 40k+ Partitions

2015-04-16 Thread Daniel Haviv
> > Thanks > Tianqi > > From: Daniel Haviv [mailto:daniel.ha...@veracity-group.com] > Sent: Wednesday, April 15, 2015 9:23 PM > To: user@hive.apache.org > Subject: Re: Extremely Slow Data Loading with 40k+ Partitions > > How many reducers are you using? > &g

RE: Extremely Slow Data Loading with 40k+ Partitions

2015-04-16 Thread Tianqi Tong
size is growing, but it's kind of slow. For mapreduce job, I had 400+ mappers and 100+ reducers. Thanks Tianqi From: Daniel Haviv [mailto:daniel.ha...@veracity-group.com] Sent: Wednesday, April 15, 2015 9:23 PM To: user@hive.apache.org Subject: Re: Extremely Slow Data Loading wit

Re: Extremely Slow Data Loading with 40k+ Partitions

2015-04-15 Thread Daniel Haviv
How many reducers are you using? Daniel > On 16 באפר׳ 2015, at 00:55, Tianqi Tong wrote: > > Hi, > I'm loading data to a Parquet table with dynamic partitons. I have 40k+ > partitions, and I have skipped the partition stats computation step. > Somehow it's still exetremely slow loading data in

Extremely Slow Data Loading with 40k+ Partitions

2015-04-15 Thread Tianqi Tong
Hi, I'm loading data to a Parquet table with dynamic partitons. I have 40k+ partitions, and I have skipped the partition stats computation step. Somehow it's still exetremely slow loading data into partitions (800MB/h). Do you have any hints on the possible reason and solution? Thank you Tianqi T