Take a look at the implementation of typed sum/avg: https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/expressions/scalalang/typed.scala
You can implement a typed max/min. On Tue, Jun 7, 2016 at 4:31 PM, Alexander Pivovarov <apivova...@gmail.com> wrote: > Ted, It does not work like that > > you have to .map(toAB).toDS > > On Tue, Jun 7, 2016 at 4:07 PM, Ted Yu <yuzhih...@gmail.com> wrote: > >> Have you tried the following ? >> >> Seq(1->2, 1->5, 3->6).toDS("a", "b") >> >> then you can refer to columns by name. >> >> FYI >> >> >> On Tue, Jun 7, 2016 at 3:58 PM, Alexander Pivovarov <apivova...@gmail.com >> > wrote: >> >>> I'm trying to switch from RDD API to Dataset API >>> My question is about reduceByKey method >>> >>> e.g. in the following example I'm trying to rewrite >>> >>> sc.parallelize(Seq(1->2, 1->5, 3->6)).reduceByKey(math.max).take(10) >>> >>> using DS API. That is what I have so far: >>> >>> Seq(1->2, 1->5, >>> 3->6).toDS.groupBy(_._1).agg(max($"_2").as(ExpressionEncoder[Int])).take(10) >>> >>> Questions: >>> >>> 1. is it possible to avoid typing "as(ExpressionEncoder[Int])" or >>> replace it with smth shorter? >>> >>> 2. Why I have to use String column name in max function? e.g. $"_2" or >>> col("_2"). can I use _._2 instead? >>> >>> >>> Alex >>> >> >> >