Hi,

I’m observing a weird behavior in zeppelin %spark. I have the following in my 
paragraph:

case class Test(a: Int, b: Int)
val a = Test(1,2)
val b = Test(1,2)
val c = Test(2,3)
val l = List(a,b,c)
val rdd = spark.sparkContext.parallelize(l)
rdd.map(v => (1, 
v)).aggregateByKey(scala.collection.mutable.HashSet.empty[Test])((result, item) 
=> result + item, (result1, result2) => result1 ++ result2).collect()

I would expect the result to be Array((1, Set(Test(1,2), Test(2,3)))), however, 
I’m actually seeing Array(1, Set(Test(1,2), Test(1,2), Test(2,3)))). Why is 
spark unable to set aggregate on my case class? Is this a zeppelin issue or 
spark problem?

I have confirmed that doing Set(a,b,c) in scala REPL returns back 
Set(Test(1,2), Test(2,3)), as expected.

Thanks,
Anqi

Reply via email to