Good tip, thanks!

From: Dilip Joseph [mailto:dilip.antony.jos...@gmail.com]
Sent: Monday, December 06, 2010 4:33 PM
To: user@hive.apache.org
Subject: Re: Query output formatting

If you have a fixed number of known CDNs, the following query can help:

SELECT  hour,
               SUM(IF(cdn=8, bitrate,0))/SUM(IF(cdn=8, 1, 0)) avgBitrateCdn8,
               SUM(IF(cdn=9, bitrate,0))/SUM(IF(cdn=9, 1, 0)) avgBitrateCdn9 -- 
You will need more IFs to handle 0 denominators.
FROM fact_table
GROUP BY hour

Dilip

On Mon, Dec 6, 2010 at 1:01 PM, Steven Wong 
<sw...@netflix.com<mailto:sw...@netflix.com>> wrote:
I have this query to calculate some averages:

select hour, cdn, avg(bitrate) from fact_table group by hour, cdn
1              8              a
1              9              b
2              8              c
3              8              d
3              9              e

But I want the output to be in the following format so that it can be 
eyeballed/graphed more easily:

1              a              b
2              c              NULL
3              d             e

(The 2nd and 3rd columns are averages for cdn 8 and 9, respectively.)

Is there an easy way to do this - in Hive, Unix, etc.? Suggestions (or, better 
yet, solutions) are welcome.

I imagine a UDAF could do this (see below), but AFAIK it is not built into Hive.

select hour, some_udaf(abr, cdn, array(8, 9))
from (select hour, cdn, avg(bitrate) as abr from fact_table group by hour, cdn) 
t
group by hour

Thanks.
Steven




--
_________________________________________
Dilip Antony Joseph
http://csgrad.blogspot.com
http://www.marydilip.info

Reply via email to