Following up on an earlier thread <http://apache-spark-developers-list.1001551.n3.nabble.com/Tungsten-off-heap-memory-access-for-C-libraries-td13898.html> , I would like to access the off-heap representation of cached data in Spark 2.0 in order to see how Spark might be linked to physics software written in C and C++. I'm willing to do exploration on my own, but could somebody point me to a place to start? I have downloaded the 2.0 preview and created a persisted Dataset: import scala.util.Randomcase class Muon(px: Double, py: Double) { def pt = Math.sqrt(px*px + py*py)}val rdd = sc.parallelize(0 until 10000 map {x => Muon(Random.nextGaussian, Random.nextGaussian) }, 10)val df = rdd.toDFval ds = df.as[Muon]ds.persist() So I have a Dataset in memory, and if I understand the blog articles <https://databricks.com/blog/2015/04/28/project-tungsten-bringing-spark-closer-to-bare-metal.html> correctly, it's in off-heap memory (sun.misc.Unsafe). Is there any way I could get a pointer to that data that I could explore with BridJ? Any hints on how it's stored? Like, could I get started through some Djinni calls or something? Thanks! -- Jim
-- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/How-to-access-the-off-heap-representation-of-cached-data-in-Spark-2-0-tp17701.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.