Take a look at the implementation of typed sum/avg:
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/expressions/scalalang/typed.scala
You can implement a typed max/min.
On Tue, Jun 7, 2016 at 4:31 PM, Alexander Pivovarov
wrote:
> Ted, It does not work l
Ted, It does not work like that
you have to .map(toAB).toDS
On Tue, Jun 7, 2016 at 4:07 PM, Ted Yu wrote:
> Have you tried the following ?
>
> Seq(1->2, 1->5, 3->6).toDS("a", "b")
>
> then you can refer to columns by name.
>
> FYI
>
>
> On Tue, Jun 7, 2016 at 3:58 PM, Alexander Pivovarov
> wro
Have you tried the following ?
Seq(1->2, 1->5, 3->6).toDS("a", "b")
then you can refer to columns by name.
FYI
On Tue, Jun 7, 2016 at 3:58 PM, Alexander Pivovarov
wrote:
> I'm trying to switch from RDD API to Dataset API
> My question is about reduceByKey method
>
> e.g. in the following exa
I'm trying to switch from RDD API to Dataset API
My question is about reduceByKey method
e.g. in the following example I'm trying to rewrite
sc.parallelize(Seq(1->2, 1->5, 3->6)).reduceByKey(math.max).take(10)
using DS API. That is what I have so far:
Seq(1->2, 1->5,
3->6).toDS.groupBy(_._1).ag