On Tue, May 13, 2014 at 8:26 AM, Michael Malak <michaelma...@yahoo.com>wrote:
> Reposting here on dev since I didn't see a response on user: > > I'm seeing different Serializable behavior in Spark Shell vs. Scala Shell. > In the Spark Shell, equals() fails when I use the canonical equals() > pattern of match{}, but works when I subsitute with isInstanceOf[]. I am > using Spark 0.9.0/Scala 2.10.3. > > Is this a bug? > > Spark Shell (equals uses match{}) > ================================= > > class C(val s:String) extends Serializable { > override def equals(o: Any) = o match { > case that: C => that.s == s > case _ => false > } > } > > val x = new C("a") > val bos = new java.io.ByteArrayOutputStream() > val out = new java.io.ObjectOutputStream(bos) > out.writeObject(x); > val b = bos.toByteArray(); > out.close > bos.close > val y = new java.io.ObjectInputStream(new > java.io.ByteArrayInputStream(b)).readObject().asInstanceOf[C] > x.equals(y) > > res: Boolean = false > > Spark Shell (equals uses isInstanceOf[]) > ======================================== > > class C(val s:String) extends Serializable { > override def equals(o: Any) = if (o.isInstanceOf[C]) > (o.asInstanceOf[C].s == s) else false > } > > val x = new C("a") > val bos = new java.io.ByteArrayOutputStream() > val out = new java.io.ObjectOutputStream(bos) > out.writeObject(x); > val b = bos.toByteArray(); > out.close > bos.close > val y = new java.io.ObjectInputStream(new > java.io.ByteArrayInputStream(b)).readObject().asInstanceOf[C] > x.equals(y) > > res: Boolean = true > > Scala Shell (equals uses match{}) > ================================= > > class C(val s:String) extends Serializable { > override def equals(o: Any) = o match { > case that: C => that.s == s > case _ => false > } > } > > val x = new C("a") > val bos = new java.io.ByteArrayOutputStream() > val out = new java.io.ObjectOutputStream(bos) > out.writeObject(x); > val b = bos.toByteArray(); > out.close > bos.close > val y = new java.io.ObjectInputStream(new > java.io.ByteArrayInputStream(b)).readObject().asInstanceOf[C] > x.equals(y) > > res: Boolean = true > Hmm. I see that this can be reproduced without Spark in Scala 2.11, with and without -Yrepl-class-based command line flag to the repl. Spark's REPL has the effective behavior of 2.11's -Yrepl-class-based flag. Inspecting the byte code generated, it appears -Yrepl-class-based results in the creation of "$outer" field in the generated classes (including class C). The first case match in equals() is resulting code along the lines of (simplified): if (o isinstanceof Cstr && this.$outer == that.$outer) { // do string compare // } $outer is the synthetic field object to the outer object in which the object was created (in this case, the repl environment). Now obviously, when x is taken through the bytestream and deserialized, it would have a new $outer created (it may have deserialized in a different jvm or machine for all we know). So the $outer's mismatching is expected. However I'm still trying to understand why the outers need to be the same for the case match.