I have a table with 4 columns: a, b, c, time What I need is something like:
SELECT a, b, GroupFirst(c) FROM t GROUP BY a, b GroupFirst means "the first" item of column c group, and by "the first" I mean minimal "time" in that group. In Oracle/Sql Server, we could write: WITH summary AS ( SELECT a, b, c, ROW_NUMBER() OVER(PARTITION BY a, b ORDER BY time) AS num FROM t)SELECT s.*FROM summary sWHERE s.num = 1 but in Spark SQL, there is no such thing as ROW_NUMBER() I wonder how to make it.