I followd an example presented in the tutorial Learning Spark
<http://www.safaribooksonline.com/library/view/learning-spark/9781449359034/ch04.html>
to compute the per-key average as follows:


val Array(appName) = args
val sparkConf = new SparkConf()
.setAppName(appName)
val sc = new SparkContext(sparkConf)
/*
 * compute the per-key average of values
 * results should be:
 *    A : 5.8
 *    B : 14
 *    C : 60.6
 */
val rdd = sc.parallelize(List(
("A", 3), ("A", 9), ("A", 12), ("A", 0), ("A", 5),
("B", 4), ("B", 10), ("B", 11), ("B", 20), ("B", 25),
("C", 32), ("C", 91), ("C", 122), ("C", 3), ("C", 55)), 2)
val avg = rdd.combineByKey(
(x:Int) => (x, 1),  // java.lang.ClassCastException: scala.Tuple2$mcII$sp
cannot be cast to java.lang.Integer
(acc:(Int, Int), x) => (acc._1 + x, acc._2 + 1),
(acc1:(Int, Int), acc2:(Int, Int)) => (acc1._1 + acc2._1, acc1._2 +
acc2._2))
.map{case (s, t) => (s, t._1/t._2.toFloat)}
 avg.collect.foreach(t => println(t._1 + " ->" + t._2))



When I submitted the application, an exception of
"*java.lang.ClassCastException:
scala.Tuple2$mcII$sp cannot be cast to java.lang.Integer*" was thrown out.
The tutorial said that the first function of *combineByKey*, *(x:Int) =>
(x, 1)*, should take a single element in the source RDD and return an
element of the desired type in the resulting RDD. In my application, we
take a single element of type *Int *from the source RDD and return a tuple
of type (*Int*, *Int*), which meets the requirements quite well. But why
would such an exception be thrown?

I'm using CDH 5.0 and Spark 0.9

Thanks.

Reply via email to