Hi, I'm coming from the particle physics community and I'm also very interested in the development of this project. We have a huge C++ codebase and would like to start using the higher-level abstractions of Spark in our data analyses. To this end, I've been developing code that copies data from our C++ framework, ROOT, into Scala:
https://github.com/diana-hep/rootconverter/tree/master/scaroot-reader <https://github.com/diana-hep/rootconverter/tree/master/scaroot-reader> (Worth noting: the ROOT file format is too complex for a complete rewrite in Java or Scala to be feasible. ROOT readers in Java and even Javascript exist, but they only handle simple cases.) I have a variety of options for how to lay out the bytes during this transfer, and in all cases fill the constructor arguments of Scala classes using macros. When I learned that you're moving the Spark data off-heap (at the same time as I'm struggling to move it on-heap), I realized that you must have chosen a serialization format for that data, and I should be using /that/ serialization format. Even though it's early, do you have any designs for that serialization format? Have you picked a standard one? Most of the options, such as Avro, don't make a lot of sense because they pack integers to minimize number of bytes, rather than lay them out for efficient access (including any byte-alignment considerations). Also, are there any plans for an API that /fills/ an RDD or DataSet from the C++ side, as I'm trying to do? Thanks, -- Jim P.S. Concerning Java/C++ bindings, there are many. I tried JNI, JNA, BridJ, and JavaCPP personally, but in the end picked JNA because of its (comparatively) large user base. If Spark will be using Djinni, that could be a symmetry-breaking consideration and I'll start using it for consistency, maybe even interoperability. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Tungsten-off-heap-memory-access-for-C-libraries-tp13898p17387.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org