Re: [Hive] Slow Loading Data Process with Parquet over 30k Partitions

2015-04-17 Thread Slava Markeyev
I've created HIVE-10385 and attached a patch. Unit tests to come. -Slava On Fri, Apr 17, 2015 at 1:34 PM, Chris Roblee wrote: > Hi Slava, > > We would be interested in reviewing your patch. Can you please provide > more details? > > Is there any other way to disable the partition creation step

Re: [Hive] Slow Loading Data Process with Parquet over 30k Partitions

2015-04-17 Thread Chris Roblee
Hi Slava, We would be interested in reviewing your patch. Can you please provide more details? Is there any other way to disable the partition creation step? Thanks, Chris On 4/13/15 10:59 PM, Slava Markeyev wrote: This is something I've encountered when doing ETL with hive and having it c

Re: [Hive] Slow Loading Data Process with Parquet over 30k Partitions

2015-04-14 Thread Edward Capriolo
w it's been 4 days and the first job I launched > is still not done yet, with partition stats. > > > > Thanks > > Tianqi Tong > > > > *From:* Slava Markeyev [mailto:slava.marke...@upsight.com] > *Sent:* Monday, April 13, 2015 11:00 PM > *To:* user@hive.apa

RE: [Hive] Slow Loading Data Process with Parquet over 30k Partitions

2015-04-14 Thread Tianqi Tong
d is still not done yet, with partition stats. Thanks Tianqi Tong From: Slava Markeyev [mailto:slava.marke...@upsight.com] Sent: Monday, April 13, 2015 11:00 PM To: user@hive.apache.org Cc: Sergio Pena Subject: Re: [Hive] Slow Loading Data Process with Parquet over 30k Partitions This is someth

Re: [Hive] Slow Loading Data Process with Parquet over 30k Partitions

2015-04-13 Thread Slava Markeyev
This is something I've encountered when doing ETL with hive and having it create 10's of thousands partitions. The issue is each partition needs to be added to the metastore and this is an expensive operation to perform. My work around was adding a flag to hive that optionally disables the metastor

RE: [Hive] Slow Loading Data Process with Parquet over 30k Partitions

2015-04-13 Thread Xu, Cheng A
Hi Tianqi, Can you attach hive.log as more detailed information? +Sergio Yours, Ferdinand Xu From: Tianqi Tong [mailto:tt...@brightedge.com] Sent: Friday, April 10, 2015 1:34 AM To: user@hive.apache.org Subject: [Hive] Slow Loading Data Process with Parquet over 30k Partitions Hello Hive, I'm a