Dear Spark Users,

I googled the web for several hours now but I don't find a solution for my 
problem. So maybe someone from this list can help.

I have an RDD of case classes, generated from CSV files with Spark. When I used 
the distinct operator, there were still duplicates. So I investigated and found 
out that the equals returns false although the two objects were equal (so were 
their individual fields as well as toStrings).

After googling it I found that the case class equals might break in case the 
two objects are created by different class loaders. So I implemented my own 
equals method using mattern matching (code example below). It still didn't 
work. Some debugging revealed that the problem lies in the pattern matching. 
Depending on the objects I compare (and maybe the split / classloader they are 
generated in?) the patternmatching works /doesn't:

case class Customer(id: String, age: Option[Int], entryDate: 
Option[java.util.Date]) {
  def equals(that: Any): Boolean = that match {
    case Customer(id, age, entryDate) => {
      println("Pattern matching worked!")
      this.id == id && this.age == age && this.entryDate == entryDate
    }
    case _ => false
  }
}

//val x: Array[Customer]
// ... some spark code to filter original data and collect x

scala> x(0)
Customer("a", Some(5), Some(Fri Sep 23 00:00:00 CEST 1994))
scala> x(1)
Customer("a", None, None)
scala> x(2)
Customer("a", None, None)
scala> x(3)
Customer("a", None, None)

scala> x(0) == x(0) // should be true and works
Pattern matching works!
res0: Boolean = true
scala> x(0) == x(1) // should be false and works
Pattern matching works!
res1: Boolean = false
scala> x(1) == x(2) // should be true, does not work
res2: Boolean = false
scala> x(2) == x(3) // should be true, does not work
Pattern matching works!
res3: Boolean = true
scala> x(0) == x(3) // should be false, does not work
res4: Boolean = false

Why is the pattern matching not working? It seems that there are two kinds of 
Customers: 0,1 and 2,3 which don't match somehow. Is this related to some 
classloaders? Is there a way around this other than using instanceof and 
defining a custom equals operation for every case class I write?

Thanks for the help!
Frank

Reply via email to