I have a table with 4 columns: a, b, c, time What I need is something like:
SELECT a, b, GroupFirst(c)
FROM t
GROUP BY a, b
GroupFirst means "the first" item of column c group,
and by "the first" I mean minimal "time" in that group.
In Oracle/Sql Server, we could write:
WITH summary AS (
SELECT a,
b, c,
ROW_NUMBER() OVER(PARTITION BY a, b ORDER BY time) AS num
FROM t)SELECT s.*FROM summary sWHERE s.num = 1
but in Spark SQL, there is no such thing as ROW_NUMBER()
I wonder how to make it.
