Dataset?

Holden Karau Fri, 05 Aug 2016 12:44:03 -0700

Spark does not currently support Apache Arrow - probably a good place to
chat would be on the Arrow mailing list where they are making progress
towards unified JVM & Python/R support which is sort of a precondition of a
functioning Arrow interface between Spark and Python.


On Fri, Aug 5, 2016 at 12:40 PM, jpivar...@gmail.com <jpivar...@gmail.com>
wrote:

> In a few earlier posts [ 1
> <http://apache-spark-developers-list.1001551.n3.
> nabble.com/Tungsten-off-heap-memory-access-for-C-libraries-td13898.html>
> ] [ 2
> <http://apache-spark-developers-list.1001551.n3.
> nabble.com/How-to-access-the-off-heap-representation-of-
> cached-data-in-Spark-2-0-td17701.html>
> ], I asked about moving data from C++ into a Spark data source (RDD,
> DataFrame, or Dataset). The issue is that even the off-heap cache might not
> have a stable representation: it might change from one version to the next.
>
> I recently learned about Apache Arrow, a data layer that Spark currently or
> will someday share with Pandas, Impala, etc. Suppose that I can fill a
> buffer (such as a direct ByteBuffer) with Arrow-formatted data, is there an
> easy--- or even zero-copy--- way to use that in Spark? Is that an API that
> could be developed?
>
> I'll be at the KDD Spark 2.0 tutorial on August 15. Is that a good place to
> ask this question?
>
> Thanks,
> -- Jim
>
>
>
>
> --
> View this message in context: http://apache-spark-
> developers-list.1001551.n3.nabble.com/Apache-Arrow-data-
> in-buffer-to-RDD-DataFrame-Dataset-tp18563.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


-- 
Cell : 425-233-8271
Twitter: https://twitter.com/holdenkarau

Re: Apache Arrow data in buffer to RDD/DataFrame/Dataset?

Reply via email to