You can also have a reduce-side bottleneck if, for example, you are doing
distinct counts or with skewed group sizes (ie one aggregation group is
much larger than others).
But to know this you really need to look at the stats of your jobs via the
jobtracker and even the progress counter output of
Hello,
My thoughts are rather straightforward: it is best not to think of hive
as a data warehouse at all. period.
It is better to think of it as SQL to MapReduce translation layer with some
meta data to help guide the process.
With this in mind, and if you really have lots of data, what you
t; "The One to Watch" - Treasury Today's Adam Smith Awards 2009.
>
>
> - Original Message -
> From: "Justin Coffey"
> To: user@hive.apache.org
> Sent: Monday, April 23, 2012 5:19:15 AM
> Subject: Re: Lifecycle and Configuration of a hive UDF
>
> H
t at
>> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Transform
>> ).
>> >
>> > I don't think rank can be done using a UDF.
>> >
>> > Good luck!
>> >
>> > Mark
>> >
>> > Mark Grover, Business Intel
Hello All,
I second this question. I have a MS SQL "rank" function which I would
like to run, the results it gives appears to suggest it is executed Mapper
side as opposed to reducer side, even when run with "cluster by"
constraints.
-Justin
On Thu, Apr 19, 2012 at 1:21 AM, Ranjan Bagchi wrot