Other than writing a custom UDAF or TRANSFORM script, a somewhat ugly way is
something like:
SELECT user_id, split(max(concat(time, '_', colour)), '_')[1]
FROM T
GROUP BY user_id
From: mdefoinplatel@orange.com [mailto:mdefoinplatel@orange.com]
Sent: Thursday, January 26, 2012 3:24 AM
To
Just recently, a new way of doing windowing functionality was posted at:
https://github.com/hbutani/SQLWindowing
This is quite comprehensive and includes about 16 functions.
This is an approach to solve HIVE-896 which is the issue about Lag/Lead etc
functions.
There is a detailed document about
I don't think there is a better way to implement your query using the
standard SQL/Hive.
A python reducer (or a java UDF) is the way to go.
I don't think clustering would help since there is no way to specify what
you want in HiveQL alone.
igor
decide.com
On Thu, Jan 26, 2012 at 3:23 AM, wrote