Hi,

I'm coming from the particle physics community and I'm also very interested
in the development of this project. We have a huge C++ codebase and would
like to start using the higher-level abstractions of Spark in our data
analyses. To this end, I've been developing code that copies data from our
C++ framework, ROOT, into Scala:

https://github.com/diana-hep/rootconverter/tree/master/scaroot-reader
<https://github.com/diana-hep/rootconverter/tree/master/scaroot-reader>  

(Worth noting: the ROOT file format is too complex for a complete rewrite in
Java or Scala to be feasible. ROOT readers in Java and even Javascript
exist, but they only handle simple cases.)

I have a variety of options for how to lay out the bytes during this
transfer, and in all cases fill the constructor arguments of Scala classes
using macros. When I learned that you're moving the Spark data off-heap (at
the same time as I'm struggling to move it on-heap), I realized that you
must have chosen a serialization format for that data, and I should be using
/that/ serialization format.

Even though it's early, do you have any designs for that serialization
format? Have you picked a standard one? Most of the options, such as Avro,
don't make a lot of sense because they pack integers to minimize number of
bytes, rather than lay them out for efficient access (including any
byte-alignment considerations).

Also, are there any plans for an API that /fills/ an RDD or DataSet from the
C++ side, as I'm trying to do?

Thanks,
-- Jim


P.S. Concerning Java/C++ bindings, there are many. I tried JNI, JNA, BridJ,
and JavaCPP personally, but in the end picked JNA because of its
(comparatively) large user base. If Spark will be using Djinni, that could
be a symmetry-breaking consideration and I'll start using it for
consistency, maybe even interoperability.




--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Tungsten-off-heap-memory-access-for-C-libraries-tp13898p17387.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Reply via email to