collect_set doesn't output in any particular order, so it can't be used, right?

-----Original Message-----
From: yongqiang he [mailto:heyongqiang...@gmail.com] 
Sent: Monday, December 06, 2010 1:18 PM
To: user@hive.apache.org
Subject: Re: Query output formatting

You can use collect_set() udaf. (And use lateral view join and explode
if you want operate on the set data.)

On Mon, Dec 6, 2010 at 1:01 PM, Steven Wong <sw...@netflix.com> wrote:
> I have this query to calculate some averages:
>
>
>
> select hour, cdn, avg(bitrate) from fact_table group by hour, cdn
>
> 1              8              a
>
> 1              9              b
>
> 2              8              c
>
> 3              8              d
>
> 3              9              e
>
>
>
> But I want the output to be in the following format so that it can be
> eyeballed/graphed more easily:
>
>
>
> 1              a              b
>
> 2              c              NULL
>
> 3              d             e
>
>
>
> (The 2nd and 3rd columns are averages for cdn 8 and 9, respectively.)
>
>
>
> Is there an easy way to do this - in Hive, Unix, etc.? Suggestions (or,
> better yet, solutions) are welcome.
>
>
>
> I imagine a UDAF could do this (see below), but AFAIK it is not built into
> Hive.
>
>
>
> select hour, some_udaf(abr, cdn, array(8, 9))
>
> from (select hour, cdn, avg(bitrate) as abr from fact_table group by hour,
> cdn) t
>
> group by hour
>
>
>
> Thanks.
>
> Steven
>
>

Reply via email to