Re: Percentile/udaf functions

2011-10-21 Thread Mayuresh
Looks exactly like what i was looking for! Thanks John! On Oct 22, 2011 12:49 AM, "John Sichi" wrote: > On Oct 21, 2011, at 10:07 AM, Mayuresh wrote: > > > Hi, > > > > I am trying to understand the exact code flow of how the > percentile_approx function works What happens step by step. Is th

Re: hive runs slowly

2011-10-21 Thread john smith
You mean select a,b from a inner join b on (a.id=b.id) ? or Does those brackets make some difference? Because the inner keyword is no where mentioned in the language manual https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins Any hints? On Fri, Oct 21, 2011 at 8:47 PM, Edward

Re: Percentile/udaf functions

2011-10-21 Thread John Sichi
On Oct 21, 2011, at 10:07 AM, Mayuresh wrote: > Hi, > > I am trying to understand the exact code flow of how the percentile_approx > function works What happens step by step. Is there some write up which > would help understanding the architecture? I am looking to understand how to > add n

Percentile/udaf functions

2011-10-21 Thread Mayuresh
Hi, I am trying to understand the exact code flow of how the percentile_approx function works What happens step by step. Is there some write up which would help understanding the architecture? I am looking to understand how to add new functions to hive. Thanks, Mayuresh

Re: hive runs slowly

2011-10-21 Thread Edward Capriolo
On Fri, Oct 21, 2011 at 10:21 AM, john smith wrote: > Hi Edward, > > Thanks for replying. I have been using the query > > "select a,b from a,b where a.id=b.id ". According to my knowledge of > Hive, it reads data of both A and B and emits data> pairs as map outputs and then performs cartesian j

Re: hive runs slowly

2011-10-21 Thread john smith
Hi Edward, Thanks for replying. I have been using the query "select a,b from a,b where a.id=b.id ". According to my knowledge of Hive, it reads data of both A and B and emits pairs as map outputs and then performs cartesian joins on reduce side for the same join_keys . Is this the cartesian jo

Re: hive runs slowly

2011-10-21 Thread Edward Capriolo
On Fri, Oct 21, 2011 at 9:22 AM, john smith wrote: > Hi list, > > I am also facing the same problem. My reducers hang at this position and it > takes hours to complete a single reduce task. Can any hive guru help us out > with this issue. > > Thanks, > jS > > 2011/10/21 bangbig > >> HI all, >> >

Re: hive runs slowly

2011-10-21 Thread john smith
Hi list, I am also facing the same problem. My reducers hang at this position and it takes hours to complete a single reduce task. Can any hive guru help us out with this issue. Thanks, jS 2011/10/21 bangbig > HI all, > > HIVE runs too slowly when it is doing such things(see the log below), wh

hive runs slowly

2011-10-21 Thread bangbig
HI all,HIVE runs too slowly when it is doing such things(see the log below), what's the problem? because I'm joining two large table?it runs pretty fast at first. when the job finishes 95%, it begins to slow down.--INFO org.apache.hadoop.hive.ql.e

Re: Running hive on large number of files in S3

2011-10-21 Thread Thulasi Ram Naidu Peddineni
Thanks all for the inputs. I will try using 0.7.1. I will also revisit my partitioning logic and will try to reduce the number of partitions. - Thanks again, Thulasi Ram P On Fri, Oct 21, 2011 at 2:49 AM, Steven Wong wrote: > If you are using Amazon EMR, you can set hive.optimize.s3.query=

split into less files

2011-10-21 Thread Vikas Srivastava
Hey All, i have an issue like i got a table having single partition but in that partition say around 100 200mb files when i overwrite this into other table its make 100 files of 20 mb(compressed) what i want is that it should make only 1 or 2 or 10 file of 200mb or 100mb means after overwrite