Re: HiveQL for 'rank() over (partition by ... order by ...)'?

Igor Tatarinov Wed, 25 May 2011 07:43:33 -0700

Yes, we have UDF functions that compute cumulative (such a rank) and moving
aggregates.
In each case, the first parameter is the partitioning key so that the
function knows when to 'reset' at the start of a new partition.
To enforce partitioning and ordering, use DISTRIBUTE BY key  / SORT BY key,
sort_attr

One caveat: you have to do DISTRIBUTE/SORT BY in a subquery. The following
doesn't work correctly:

SELECT product_id, date, mavg(product_id, price, 10)
FROM Prices
DISTRIBUTE BY product_id
SORT BY product_id, date

You have to use a subquery:
SELECT ...
FROM (SELECT *
           FROM Prices
           DISTRIBUTE BY ..
           SORT BY ...)

If you have multiple key attributes you have to combine them into a single
key using concat_ws() for example.

As far as the UDF implementation goes, you have to define your function for
every combination of key and base attribute types that you are going to use.
 I don't know any way around that.

Hope this helps

On Wed, May 25, 2011 at 4:45 AM, 김영우 <warwit...@gmail.com> wrote:

> Hi All,
>
> I'm trying to convert Oracle SQL to HiveQL, esp Oracle Analytic functions.
> Is it possible to implement using Hive UDFs or workarounds?
> I would like to hear your experiences and advice.
>
> Thanks in advance.
>
> - Youngwoo
>

Re: HiveQL for 'rank() over (partition by ... order by ...)'?

Reply via email to