Relevant jira: https://issues.apache.org/jira/browse/SPARK-13534 2016년 8월 5일 (금) 오후 5:22, Holden Karau <hol...@pigscanfly.ca>님이 작성:
> I don't think there is an approximate timescale right now and its likely > any implementation would depend on a solid Java implementation of Arrow > being ready first (or even a guarantee that it necessarily will - although > I'm interested in making it happen in some places where it makes sense). > > On Fri, Aug 5, 2016 at 2:18 PM, Jim Pivarski <jpivar...@gmail.com> wrote: > >> I see. I've already started working with Arrow-C++ and talking to members >> of the Arrow community, so I'll keep doing that. >> >> As a follow-up question, is there an approximate timescale for when Spark >> will support Arrow? I'd just like to know that all the pieces will come >> together eventually. >> >> (In this forum, most of the discussion about Arrow is about PySpark and >> Pandas, not Spark in general.) >> >> Best, >> Jim >> >> On Aug 5, 2016 2:43 PM, "Holden Karau" <hol...@pigscanfly.ca> wrote: >> >>> Spark does not currently support Apache Arrow - probably a good place to >>> chat would be on the Arrow mailing list where they are making progress >>> towards unified JVM & Python/R support which is sort of a precondition of a >>> functioning Arrow interface between Spark and Python. >>> >>> On Fri, Aug 5, 2016 at 12:40 PM, jpivar...@gmail.com < >>> jpivar...@gmail.com> wrote: >>> >>>> In a few earlier posts [ 1 >>>> < >>>> http://apache-spark-developers-list.1001551.n3.nabble.com/Tungsten-off-heap-memory-access-for-C-libraries-td13898.html >>>> > >>>> ] [ 2 >>>> < >>>> http://apache-spark-developers-list.1001551.n3.nabble.com/How-to-access-the-off-heap-representation-of-cached-data-in-Spark-2-0-td17701.html >>>> > >>>> ], I asked about moving data from C++ into a Spark data source (RDD, >>>> DataFrame, or Dataset). The issue is that even the off-heap cache might >>>> not >>>> have a stable representation: it might change from one version to the >>>> next. >>>> >>>> I recently learned about Apache Arrow, a data layer that Spark >>>> currently or >>>> will someday share with Pandas, Impala, etc. Suppose that I can fill a >>>> buffer (such as a direct ByteBuffer) with Arrow-formatted data, is >>>> there an >>>> easy--- or even zero-copy--- way to use that in Spark? Is that an API >>>> that >>>> could be developed? >>>> >>>> I'll be at the KDD Spark 2.0 tutorial on August 15. Is that a good >>>> place to >>>> ask this question? >>>> >>>> Thanks, >>>> -- Jim >>>> >>>> >>>> >>>> >>>> -- >>>> View this message in context: >>>> http://apache-spark-developers-list.1001551.n3.nabble.com/Apache-Arrow-data-in-buffer-to-RDD-DataFrame-Dataset-tp18563.html >>>> Sent from the Apache Spark Developers List mailing list archive at >>>> Nabble.com. >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>> >>>> >>> >>> >>> -- >>> Cell : 425-233-8271 >>> Twitter: https://twitter.com/holdenkarau >>> >> > > > -- > Cell : 425-233-8271 > Twitter: https://twitter.com/holdenkarau >