How to access the off-heap representation of cached data in Spark 2.0

jpivar...@gmail.com Thu, 26 May 2016 14:48:01 -0700

Following up on an  earlier thread
<http://apache-spark-developers-list.1001551.n3.nabble.com/Tungsten-off-heap-memory-access-for-C-libraries-td13898.html>
 
, I would like to access the off-heap representation of cached data in Spark
2.0 in order to see how Spark might be linked to physics software written in
C and C++.
I'm willing to do exploration on my own, but could somebody point me to a
place to start? I have downloaded the 2.0 preview and created a persisted
Dataset:
import scala.util.Randomcase class Muon(px: Double, py: Double) {  def pt =
Math.sqrt(px*px + py*py)}val rdd = sc.parallelize(0 until 10000 map {x => 
Muon(Random.nextGaussian, Random.nextGaussian)  }, 10)val df = rdd.toDFval
ds = df.as[Muon]ds.persist()
So I have a Dataset in memory, and if I understand the  blog articles
<https://databricks.com/blog/2015/04/28/project-tungsten-bringing-spark-closer-to-bare-metal.html>
  
correctly, it's in off-heap memory (sun.misc.Unsafe). Is there any way I
could get a pointer to that data that I could explore with BridJ? Any hints
on how it's stored? Like, could I get started through some Djinni calls or
something?
Thanks!
-- Jim





--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/How-to-access-the-off-heap-representation-of-cached-data-in-Spark-2-0-tp17701.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

How to access the off-heap representation of cached data in Spark 2.0

Reply via email to