Dataset?

Nicholas Chammas Fri, 05 Aug 2016 15:15:12 -0700

Relevant jira: https://issues.apache.org/jira/browse/SPARK-13534
2016년 8월 5일 (금) 오후 5:22, Holden Karau <hol...@pigscanfly.ca>님이 작성:


> I don't think there is an approximate timescale right now and its likely
> any implementation would depend on a solid Java implementation of Arrow
> being ready first (or even a guarantee that it necessarily will - although
> I'm interested in making it happen in some places where it makes sense).
>
> On Fri, Aug 5, 2016 at 2:18 PM, Jim Pivarski <jpivar...@gmail.com> wrote:
>
>> I see. I've already started working with Arrow-C++ and talking to members
>> of the Arrow community, so I'll keep doing that.
>>
>> As a follow-up question, is there an approximate timescale for when Spark
>> will support Arrow? I'd just like to know that all the pieces will come
>> together eventually.
>>
>> (In this forum, most of the discussion about Arrow is about PySpark and
>> Pandas, not Spark in general.)
>>
>> Best,
>> Jim
>>
>> On Aug 5, 2016 2:43 PM, "Holden Karau" <hol...@pigscanfly.ca> wrote:
>>
>>> Spark does not currently support Apache Arrow - probably a good place to
>>> chat would be on the Arrow mailing list where they are making progress
>>> towards unified JVM & Python/R support which is sort of a precondition of a
>>> functioning Arrow interface between Spark and Python.
>>>
>>> On Fri, Aug 5, 2016 at 12:40 PM, jpivar...@gmail.com <
>>> jpivar...@gmail.com> wrote:
>>>
>>>> In a few earlier posts [ 1
>>>> <
>>>> http://apache-spark-developers-list.1001551.n3.nabble.com/Tungsten-off-heap-memory-access-for-C-libraries-td13898.html
>>>> >
>>>> ] [ 2
>>>> <
>>>> http://apache-spark-developers-list.1001551.n3.nabble.com/How-to-access-the-off-heap-representation-of-cached-data-in-Spark-2-0-td17701.html
>>>> >
>>>> ], I asked about moving data from C++ into a Spark data source (RDD,
>>>> DataFrame, or Dataset). The issue is that even the off-heap cache might
>>>> not
>>>> have a stable representation: it might change from one version to the
>>>> next.
>>>>
>>>> I recently learned about Apache Arrow, a data layer that Spark
>>>> currently or
>>>> will someday share with Pandas, Impala, etc. Suppose that I can fill a
>>>> buffer (such as a direct ByteBuffer) with Arrow-formatted data, is
>>>> there an
>>>> easy--- or even zero-copy--- way to use that in Spark? Is that an API
>>>> that
>>>> could be developed?
>>>>
>>>> I'll be at the KDD Spark 2.0 tutorial on August 15. Is that a good
>>>> place to
>>>> ask this question?
>>>>
>>>> Thanks,
>>>> -- Jim
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://apache-spark-developers-list.1001551.n3.nabble.com/Apache-Arrow-data-in-buffer-to-RDD-DataFrame-Dataset-tp18563.html
>>>> Sent from the Apache Spark Developers List mailing list archive at
>>>> Nabble.com.
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>>
>>>>
>>>
>>>
>>> --
>>> Cell : 425-233-8271
>>> Twitter: https://twitter.com/holdenkarau
>>>
>>
>
>
> --
> Cell : 425-233-8271
> Twitter: https://twitter.com/holdenkarau
>

Re: Apache Arrow data in buffer to RDD/DataFrame/Dataset?

Reply via email to