Hi Wes,

Thanks for raising the ticket. So it seems like Spark 2.0 will not have
support for Arrow.
Also does SPARK-13534 cover Arrow serialization for Spark's JAVA API, or do
we need to raise a separate ticket for that?

As of now, I only have a high-level understanding of Arrow and it's data
structure but I'm willing to dive deeper and provide any help I can, mainly
in testing, Java serializer or additional examples. Let me know how I can
help.

Thanks,
Dima

On 1 March 2016 at 00:46, Wes McKinney <w...@cloudera.com> wrote:

> hi Dmitriy,
>
> I created the following JIRA
> https://issues.apache.org/jira/browse/SPARK-13534 related to PySpark
> which seems relevant. I would be happy to collaborate with you on
> this. Since I understand that the Spark developers are exploring an
> in-memory columnar layout for Spark DataFrames/Datasets and Spark SQL
> any conversion code we write right now may end up being temporary.
> Hopefully the Spark columnar memory layout will end up being very
> nearly the same as the official Arrow layout so that limited or no
> conversion will be necessary.
>
> Thanks
> Wes
>
> On Wed, Feb 24, 2016 at 12:38 PM, Dmitriy Morozov <int.2...@gmail.com>
> wrote:
> > Hello everyone,
> >
> > I'm just starting with Arrow. I'd like to see how good Arrow at caching
> > when used in conjunction with Allixio (Tachyon). The use case that I'm
> > going to validate involves reading data from Spark's DataFrame, storing
> in
> > Tachyon in Arrow and then reading back into DataFrame. I checked the
> source
> > code of Arrow but couldn't find any examples or tests. Can anyone guide
> me
> > please where should I start looking at in order to convert DataFrame to a
> > Arrow struct?
> >
> > Thanks!
> > Dmitriy
>



-- 
Kind regards,
Dima

Reply via email to