This problem was caused by the fact that I used a package jar with a Spark
version (0.9.1) different from that of the cluster (0.9.0). When I used the
correct package jar
(spark-assembly_2.10-0.9.0-cdh5.0.1-hadoop2.3.0-cdh5.0.1.jar) instead the
application can run as expected.
2014-09-15 14:57 G
How about this.
scala> val rdd2 = rdd.combineByKey(
| (v: Int) => v.toLong,
| (c: Long, v: Int) => c + v,
| (c1: Long, c2: Long) => c1 + c2)
rdd2: org.apache.spark.rdd.RDD[(String, Long)] = MapPartitionsRDD[9] at
combineB
yKey at :14
xj @ Tokyo
On Mon, Sep 15, 2014 at 3:06 PM, Tao