Hi, I’m observing a weird behavior in zeppelin %spark. I have the following in my paragraph:
case class Test(a: Int, b: Int) val a = Test(1,2) val b = Test(1,2) val c = Test(2,3) val l = List(a,b,c) val rdd = spark.sparkContext.parallelize(l) rdd.map(v => (1, v)).aggregateByKey(scala.collection.mutable.HashSet.empty[Test])((result, item) => result + item, (result1, result2) => result1 ++ result2).collect() I would expect the result to be Array((1, Set(Test(1,2), Test(2,3)))), however, I’m actually seeing Array(1, Set(Test(1,2), Test(1,2), Test(2,3)))). Why is spark unable to set aggregate on my case class? Is this a zeppelin issue or spark problem? I have confirmed that doing Set(a,b,c) in scala REPL returns back Set(Test(1,2), Test(2,3)), as expected. Thanks, Anqi