RE: Partition performance

2016-01-26 Thread Mich Talebzadeh
Check the threads in hive user group under “Impact of partitioning on certain queries” HTH Dr Mich Talebzadeh LinkedIn https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw Sybase ASE 15 Gold Medal Award 2008 A Winning Strategy: Running the most Cri

RE: Partition performance

2013-07-04 Thread Peter Marron
“same” data) even if the query explicitly specifies a single partition. (I mean I _could_ actually do the experiments myself…) Regards, Z From: Owen O'Malley [mailto:omal...@apache.org] Sent: 02 July 2013 15:52 To: user@hive.apache.org<mailto:user@hive.apache.org> Subject: Re: Partition

RE: Partition performance

2013-07-04 Thread Peter Marron
” data) even if the query explicitly specifies a single partition. (I mean I _could_ actually do the experiments myself…) Regards, Z From: Owen O'Malley [mailto:omal...@apache.org] Sent: 02 July 2013 15:52 To: user@hive.apache.org Subject: Re: Partition performance On Tue, Jul 2, 2013 at

Re: Partition performance

2013-07-03 Thread Owen O'Malley
On Wed, Jul 3, 2013 at 5:19 AM, David Morel wrote: > > That is still not really answering the question, which is: why is it slower > to run a query on a heavily partitioned table than it is on the same number > of files in a less heavily partitioned table. > According to Gopal's investigations i

Re: Partition performance

2013-07-03 Thread Edward Capriolo
1) each partition object is a row in the metastore usually mysql, querying large tables with many partitions has longer startup time as the hive query planner has to fetch and process all of this meta-information. This is not a distributed process. It is usually fast within a few seconds but for ve

Re: Partition performance

2013-07-03 Thread Dean Wampler
How big were the files in each case in your experiment? Having lots of small files will add Hadoop overhead. Also, it would be useful to know the execution times of the map and reduce tasks. The rule of thumb is that under 20 seconds each, or so, you're paying a significant of the execution time i

Re: Partition performance

2013-07-03 Thread David Morel
On 2 Jul 2013, at 16:51, Owen O'Malley wrote: > On Tue, Jul 2, 2013 at 2:34 AM, Peter Marron < > peter.mar...@trilliumsoftware.com> wrote: > >> Hi Owen, >> >> ** ** >> >> I’m curious about this advice about partitioning. Is there some >> fundamental reason why Hive >> >> is slow when the n

Re: Partition performance

2013-07-02 Thread Owen O'Malley
On Tue, Jul 2, 2013 at 2:34 AM, Peter Marron < peter.mar...@trilliumsoftware.com> wrote: > Hi Owen, > > ** ** > > I’m curious about this advice about partitioning. Is there some > fundamental reason why Hive > > is slow when the number of partitions is 10,000 rather than 1,000? > The pre

RE: Partition performance

2013-07-02 Thread Peter Marron
? (It’s not currently a problem for me but I can see that I am going to need to be able to explain the situation.) Warm regards, Z From: Owen O'Malley [mailto:omal...@apache.org] Sent: 05 April 2013 00:26 To: user@hive.apache.org Subject: Re: Partition performance See slide #9 from my Optim

Re: Partition performance

2013-04-11 Thread Ian
won't perform so differently? > >Thanks.  > From: Dean Wampler >To: user@hive.apache.org >Sent: Thursday, April 4, 2013 4:28 PM >Subject: Re: Partition performance > > > >Also, how big are the files in each directory? Are they roughly the size of >one HDFS

Re: Partition performance

2013-04-05 Thread Ramki Palle
m so differently? > > Thanks. > > *From:* Dean Wampler > *To:* user@hive.apache.org > *Sent:* Thursday, April 4, 2013 4:28 PM > *Subject:* Re: Partition performance > > Also, how big are the files in each directory? Are they roughly the size > of one HDFS block or a multip

Re: Partition performance

2013-04-05 Thread Ian
s but I'm wondering what's the reason behind it? If I run this on a real cluster, maybe it won't perform so differently?   Thanks.  From: Dean Wampler To: user@hive.apache.org Sent: Thursday, April 4, 2013 4:28 PM Subject: Re: Partition performa

Re: Partition performance

2013-04-04 Thread Dean Wampler
Also, how big are the files in each directory? Are they roughly the size of one HDFS block or a multiple. Lots of small files will mean lots of mapper tasks will little to do. You can also compare the job tracker console output for each job. I bet the slow one has a lot of very short map and reduc

Re: Partition performance

2013-04-04 Thread Owen O'Malley
See slide #9 from my Optimizing Hive Queries talk http://www.slideshare.net/oom65/optimize-hivequeriespptx . Certainly, we will improve it, but for now you are much better off with 1,000 partitions than 10,000. -- Owen On Thu, Apr 4, 2013 at 4:21 PM, Ramki Palle wrote: > Is it possible for you

Re: Partition performance

2013-04-04 Thread Ramki Palle
Is it possible for you to send the explain plan of these two queries? Regards, Ramki. On Thu, Apr 4, 2013 at 4:06 PM, Sanjay Subramanian < sanjay.subraman...@wizecommerce.com> wrote: > The slow down is most possibly due to large number of partitions. > I believe the Hive book authors tell us t

Re: Partition performance

2013-04-04 Thread Sanjay Subramanian
The slow down is most possibly due to large number of partitions. I believe the Hive book authors tell us to be cautious with large number of partitions :-) and I abide by that. Users Please add your points of view and experiences Thanks sanjay From: Ian mailto:liu...@yahoo.com>> Reply-To: "us