Re: Is my Use Case possible with Hive?

2012-05-14 Thread Justin Coffey
You can also have a reduce-side bottleneck if, for example, you are doing distinct counts or with skewed group sizes (ie one aggregation group is much larger than others). But to know this you really need to look at the stats of your jobs via the jobtracker and even the progress counter output of

Re: Dimensional Data Model on Hive

2012-05-10 Thread Justin Coffey
Hello, My thoughts are rather straightforward: it is best not to think of hive as a data warehouse at all. period. It is better to think of it as SQL to MapReduce translation layer with some meta data to help guide the process. With this in mind, and if you really have lots of data, what you

Re: Lifecycle and Configuration of a hive UDF

2012-04-24 Thread Justin Coffey
t; "The One to Watch" - Treasury Today's Adam Smith Awards 2009. > > > - Original Message - > From: "Justin Coffey" > To: user@hive.apache.org > Sent: Monday, April 23, 2012 5:19:15 AM > Subject: Re: Lifecycle and Configuration of a hive UDF > > H

Re: Lifecycle and Configuration of a hive UDF

2012-04-23 Thread Justin Coffey
t at >> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Transform >> ). >> > >> > I don't think rank can be done using a UDF. >> > >> > Good luck! >> > >> > Mark >> > >> > Mark Grover, Business Intel

Re: Lifecycle and Configuration of a hive UDF

2012-04-19 Thread Justin Coffey
Hello All, I second this question. I have a MS SQL "rank" function which I would like to run, the results it gives appears to suggest it is executed Mapper side as opposed to reducer side, even when run with "cluster by" constraints. -Justin On Thu, Apr 19, 2012 at 1:21 AM, Ranjan Bagchi wrot