Re: Arrow examples

Wes McKinney Thu, 03 Mar 2016 11:36:21 -0800

Serializing Spark DataFrame in either Java or Scala would suffice for the
use case, but there may be follow-on JIRAs to make the Arrow adapters more
accessible. pandas only needs access to flat schemas for now, for example,
so nested Spark SQL schemas could be handled in follow-up work.


Note: this is somewhat dependent on the separate thread around the metadata
specification -- ideally Spark SQL would be able to adapt its schema
metadata to a form that any Arrow consumer can use.

- Wes

On Thu, Mar 3, 2016 at 12:39 AM, Dmitriy Morozov <int.2...@gmail.com> wrote:

> Hi Wes,
>
> Thanks for raising the ticket. So it seems like Spark 2.0 will not have
> support for Arrow.
> Also does SPARK-13534 cover Arrow serialization for Spark's JAVA API, or do
> we need to raise a separate ticket for that?
>
> As of now, I only have a high-level understanding of Arrow and it's data
> structure but I'm willing to dive deeper and provide any help I can, mainly
> in testing, Java serializer or additional examples. Let me know how I can
> help.
>
> Thanks,
> Dima
>
> On 1 March 2016 at 00:46, Wes McKinney <w...@cloudera.com> wrote:
>
> > hi Dmitriy,
> >
> > I created the following JIRA
> > https://issues.apache.org/jira/browse/SPARK-13534 related to PySpark
> > which seems relevant. I would be happy to collaborate with you on
> > this. Since I understand that the Spark developers are exploring an
> > in-memory columnar layout for Spark DataFrames/Datasets and Spark SQL
> > any conversion code we write right now may end up being temporary.
> > Hopefully the Spark columnar memory layout will end up being very
> > nearly the same as the official Arrow layout so that limited or no
> > conversion will be necessary.
> >
> > Thanks
> > Wes
> >
> > On Wed, Feb 24, 2016 at 12:38 PM, Dmitriy Morozov <int.2...@gmail.com>
> > wrote:
> > > Hello everyone,
> > >
> > > I'm just starting with Arrow. I'd like to see how good Arrow at caching
> > > when used in conjunction with Allixio (Tachyon). The use case that I'm
> > > going to validate involves reading data from Spark's DataFrame, storing
> > in
> > > Tachyon in Arrow and then reading back into DataFrame. I checked the
> > source
> > > code of Arrow but couldn't find any examples or tests. Can anyone guide
> > me
> > > please where should I start looking at in order to convert DataFrame
> to a
> > > Arrow struct?
> > >
> > > Thanks!
> > > Dmitriy
> >
>
>
>
> --
> Kind regards,
> Dima
>

Re: Arrow examples

Reply via email to