I suffer from "sql-itis", especially the SQL:2003 variant. I can barely get a data-related programming line out of my fingers that doesn't include a "partition by" or an "over order by".
Many of the data sets I analyze can be simplified for comprehension by thinking of them as having only three columns: a descriptive columns (ENTITY_NAME), a date column (20101611), and a numeric or double value column (456.789). A time series. Much of what I do is partition the descriptive column, order by the date column, and apply a moving aggregate (like average) over a sliding window of the value column. Example: select entity_name, value_date, avg(value) over (partition by entity_name order by value_date rows between 20 preceding and current row) as twenty_period_moving_average from table_with_a_billion_rows I am aware that HQL does not currently include this type of SQL analytic function. What are my options, if any, to accomplish the above with Hive? I'm looking at UDAF, but I haven't yet grokked. Thanks, Michael Michael Roessler, CFA michael.roess...@keyevent.com