Ted, It does not work like that you have to .map(toAB).toDS
On Tue, Jun 7, 2016 at 4:07 PM, Ted Yu <yuzhih...@gmail.com> wrote: > Have you tried the following ? > > Seq(1->2, 1->5, 3->6).toDS("a", "b") > > then you can refer to columns by name. > > FYI > > > On Tue, Jun 7, 2016 at 3:58 PM, Alexander Pivovarov <apivova...@gmail.com> > wrote: > >> I'm trying to switch from RDD API to Dataset API >> My question is about reduceByKey method >> >> e.g. in the following example I'm trying to rewrite >> >> sc.parallelize(Seq(1->2, 1->5, 3->6)).reduceByKey(math.max).take(10) >> >> using DS API. That is what I have so far: >> >> Seq(1->2, 1->5, >> 3->6).toDS.groupBy(_._1).agg(max($"_2").as(ExpressionEncoder[Int])).take(10) >> >> Questions: >> >> 1. is it possible to avoid typing "as(ExpressionEncoder[Int])" or replace >> it with smth shorter? >> >> 2. Why I have to use String column name in max function? e.g. $"_2" or >> col("_2"). can I use _._2 instead? >> >> >> Alex >> > >