Spark does not currently support Apache Arrow - probably a good place to chat would be on the Arrow mailing list where they are making progress towards unified JVM & Python/R support which is sort of a precondition of a functioning Arrow interface between Spark and Python.
On Fri, Aug 5, 2016 at 12:40 PM, jpivar...@gmail.com <jpivar...@gmail.com> wrote: > In a few earlier posts [ 1 > <http://apache-spark-developers-list.1001551.n3. > nabble.com/Tungsten-off-heap-memory-access-for-C-libraries-td13898.html> > ] [ 2 > <http://apache-spark-developers-list.1001551.n3. > nabble.com/How-to-access-the-off-heap-representation-of- > cached-data-in-Spark-2-0-td17701.html> > ], I asked about moving data from C++ into a Spark data source (RDD, > DataFrame, or Dataset). The issue is that even the off-heap cache might not > have a stable representation: it might change from one version to the next. > > I recently learned about Apache Arrow, a data layer that Spark currently or > will someday share with Pandas, Impala, etc. Suppose that I can fill a > buffer (such as a direct ByteBuffer) with Arrow-formatted data, is there an > easy--- or even zero-copy--- way to use that in Spark? Is that an API that > could be developed? > > I'll be at the KDD Spark 2.0 tutorial on August 15. Is that a good place to > ask this question? > > Thanks, > -- Jim > > > > > -- > View this message in context: http://apache-spark- > developers-list.1001551.n3.nabble.com/Apache-Arrow-data- > in-buffer-to-RDD-DataFrame-Dataset-tp18563.html > Sent from the Apache Spark Developers List mailing list archive at > Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > > -- Cell : 425-233-8271 Twitter: https://twitter.com/holdenkarau