Thanks, pair_rdd.rdd.groupByKey() did the trick.
On Wed, Aug 10, 2016 at 8:24 PM, Holden Karau wrote:
> So it looks like (despite the name) pair_rdd is actually a Dataset - my
> guess is you might have a map on a dataset up above which used to return an
> RDD but now returns another dataset
So it looks like (despite the name) pair_rdd is actually a Dataset - my
guess is you might have a map on a dataset up above which used to return an
RDD but now returns another dataset or an unexpected implicit conversion.
Just add rdd() before the groupByKey call to push it into an RDD. That
being
Here is the offending line:
val some_rdd = pair_rdd.groupByKey().flatMap { case (mk: MyKey, md_iter:
Iterable[MyData]) => {
...
[error] .scala:249: overloaded method value groupByKey with
alternatives:
[error] [K](func:
org.apache.spark.api.java.function.MapFunction[(aaa.MyKey, aaa.My