[Spark SQL] How to select first row in each GROUP BY group?

Fengyun RAO Wed, 20 Aug 2014 00:53:24 -0700

I have a table with 4 columns: a, b, c, time

What I need is something like:


SELECT a, b, GroupFirst(c)
FROM t
GROUP BY a, b

GroupFirst means "the first" item of column c group,
and by "the first" I mean minimal "time" in that group.


In Oracle/Sql Server, we could write:

WITH summary AS (
    SELECT a,
           b,            c,
           ROW_NUMBER() OVER(PARTITION BY a, b ORDER BY time) AS num
    FROM t)SELECT s.*FROM summary sWHERE s.num = 1

but in Spark SQL, there is no such thing as ROW_NUMBER()

I wonder how to make it.

[Spark SQL] How to select first row in each GROUP BY group?

Reply via email to