Some more details: Adding a println to the function reveals that it is indeed
called only once. Furthermore, running:

/rdd/.map(_.s.hashCode).min == /rdd/.map(_.s.hashCode).max  // returns true

...reveals that all 10000000 elements do indeed point to the same object,
and so the data structure essentially behaves correctly. The problem comes
when nExamples is much larger, and so it cannot persist.

/storage.MemoryStore: Not enough space to cache rdd_1_0 in memory! (computed
6.1 GB so far)/

In this case, the comparison of hashCodes fails, because the function is
recomputed.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/RDD-with-object-shared-across-elements-within-a-partition-Magic-number-200-tp19559p19578.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to