Using a case class as a key doesn't seem to work properly. [Spark 1.0.0] A minimal example:
case class P(name:String) val ps = Array(P("alice"), P("bob"), P("charly"), P("bob")) sc.parallelize(ps).map(x=> (x,1)).reduceByKey((x,y) => x+y).collect [Spark shell local mode] res : Array[(P, Int)] = Array((P(bob),1), (P(bob),1), (P(abe),1), (P(charly),1)) In contrast to the expected behavior, that should be equivalent to: sc.parallelize(ps).map(x=> (x.name,1)).reduceByKey((x,y) => x+y).collect Array[(String, Int)] = Array((charly,1), (abe,1), (bob,2)) Any ideas why this doesn't work? -kr, Gerard.