RE: Partition performance

2016-01-26 Thread Mich Talebzadeh
dhan Manjayya [mailto:shub...@gmail.com] Sent: 27 January 2016 04:14 To: user@hive.apache.org Subject: Partition performance Hi see this cloudera blog at: http://blog.cloudera.com/blog/2014/08/improving-query-performance-using-partitioning-in-apache-hive/ That mentions "Do not over-partitio

Partition performance

2016-01-26 Thread Shubhvardhan Manjayya
Hi see this cloudera blog at: http://blog.cloudera.com/blog/2014/08/improving-query-performance-using-partitioning-in-apache-hive/ That mentions "Do not over-partition the data. With too many small partitions, the task of recursively scanning the directories becomes more expensive than a full tabl

RE: Partition performance

2013-07-04 Thread Peter Marron
“same” data) even if the query explicitly specifies a single partition. (I mean I _could_ actually do the experiments myself…) Regards, Z From: Owen O'Malley [mailto:omal...@apache.org] Sent: 02 July 2013 15:52 To: user@hive.apache.org<mailto:user@hive.apache.org> Subject: Re: Partition

RE: Partition performance

2013-07-04 Thread Peter Marron
” data) even if the query explicitly specifies a single partition. (I mean I _could_ actually do the experiments myself…) Regards, Z From: Owen O'Malley [mailto:omal...@apache.org] Sent: 02 July 2013 15:52 To: user@hive.apache.org Subject: Re: Partition performance On Tue, Jul 2, 2013 at

Re: Partition performance

2013-07-03 Thread Owen O'Malley
On Wed, Jul 3, 2013 at 5:19 AM, David Morel wrote: > > That is still not really answering the question, which is: why is it slower > to run a query on a heavily partitioned table than it is on the same number > of files in a less heavily partitioned table. > According to Gopal's investigations i

Re: Partition performance

2013-07-03 Thread Edward Capriolo
1) each partition object is a row in the metastore usually mysql, querying large tables with many partitions has longer startup time as the hive query planner has to fetch and process all of this meta-information. This is not a distributed process. It is usually fast within a few seconds but for ve

Re: Partition performance

2013-07-03 Thread Dean Wampler
Peter Marron < peter.mar...@trilliumsoftware.com> wrote: > ... > > ** ** > > *From: *Ian > *Reply-To: *"user@hive.apache.org" , Ian < > liu...@yahoo.com> > *Date: *Thursday, April 4, 2013 4:01 PM > *To: *"user@hive.apache.or

Re: Partition performance

2013-07-03 Thread David Morel
On 2 Jul 2013, at 16:51, Owen O'Malley wrote: > On Tue, Jul 2, 2013 at 2:34 AM, Peter Marron < > peter.mar...@trilliumsoftware.com> wrote: > >> Hi Owen, >> >> ** ** >> >> I’m curious about this advice about partitioning. Is there some >> fundamental reason why Hive >> >> is slow when the n

Re: Partition performance

2013-07-02 Thread Owen O'Malley
On Tue, Jul 2, 2013 at 2:34 AM, Peter Marron < peter.mar...@trilliumsoftware.com> wrote: > Hi Owen, > > ** ** > > I’m curious about this advice about partitioning. Is there some > fundamental reason why Hive > > is slow when the number of partitions is 10,000 rather than 1,000? > The pre

RE: Partition performance

2013-07-02 Thread Peter Marron
? (It’s not currently a problem for me but I can see that I am going to need to be able to explain the situation.) Warm regards, Z From: Owen O'Malley [mailto:omal...@apache.org] Sent: 05 April 2013 00:26 To: user@hive.apache.org Subject: Re: Partition performance See slide #9 from my Optim

Re: Partition performance

2013-04-11 Thread Ian
t: Re: Partition performance Can you tell how many map tasks are there in each scenario? If my assumption is correct, you should have 336 in the first case and 14 in second case. It looks like it is combing all small files in a folder and running as one map task for all 24 files in a folder, where

Re: Partition performance

2013-04-05 Thread Ramki Palle
m so differently? > > Thanks. > > *From:* Dean Wampler > *To:* user@hive.apache.org > *Sent:* Thursday, April 4, 2013 4:28 PM > *Subject:* Re: Partition performance > > Also, how big are the files in each directory? Are they roughly the size > of one HDFS block or a multip

Re: Partition performance

2013-04-05 Thread Ian
s but I'm wondering what's the reason behind it? If I run this on a real cluster, maybe it won't perform so differently?   Thanks.  From: Dean Wampler To: user@hive.apache.org Sent: Thursday, April 4, 2013 4:28 PM Subject: Re: Partition performa

Re: Partition performance

2013-04-04 Thread Dean Wampler
ell us to be cautious with large number >>> of partitions :-) and I abide by that. >>> >>> Users >>> Please add your points of view and experiences >>> >>> Thanks >>> sanjay >>> >>> From: Ian >>> Reply-

Re: Partition performance

2013-04-04 Thread Owen O'Malley
"user@hive.apache.org" , Ian < >> liu...@yahoo.com> >> Date: Thursday, April 4, 2013 4:01 PM >> To: "user@hive.apache.org" >> Subject: Partition performance >> >> Hi, >> >> I created 3 years of hourly log files (totally 26280 files

Re: Partition performance

2013-04-04 Thread Ramki Palle
o.com> > Date: Thursday, April 4, 2013 4:01 PM > To: "user@hive.apache.org" > Subject: Partition performance > > Hi, > > I created 3 years of hourly log files (totally 26280 files), and use > External Table with partition to query. I tried two partition methods. > >

Re: Partition performance

2013-04-04 Thread Sanjay Subramanian
ply-To: "user@hive.apache.org<mailto:user@hive.apache.org>" mailto:user@hive.apache.org>>, Ian mailto:liu...@yahoo.com>> Date: Thursday, April 4, 2013 4:01 PM To: "user@hive.apache.org<mailto:user@hive.apache.org>" mailto:user@hive.apache.org>> S

Partition performance

2013-04-04 Thread Ian
Hi,   I created 3 years of hourly log files (totally 26280 files), and use External Table with partition to query. I tried two partition methods.   1). Log files are stored as /test1/2013/04/02/16/00_0 (A directory per hour). Use date and hour as partition keys. Add 3 years of directories to